Q&A with Andreas Mueller, ML vs DL

Editors Note: Andreas Mueller is one of the core-contributors to scikit-learn, the open source python-based Machine Learning tool. Over the past few years, the buzz around scikit-learn grew significantly. Enterprises Spotify, Evernote and booking.com each use it. With its constantly improving user interface and computing speed we don’t imagine the appreciation for scikit-learn occluding in the near future and neither does Andreas (Andy). Last week, we got him on the phone to speak about his new book, the future for scikit-learn and to talk ML vs DL.

Describe your book Introduction to Machine Learning: A Guide for Data Scientists. What makes it different? 

It’s different in that the audience is quite different than other books. This is really for engineers and programmers who want to get started doing machine learning. Most machine learning books that are out there are geared towards computer scientists, people who a strong a statistics/math background. Basically my book contains nearly no math and a lot of code.

Why did you choose this no math all programming approach?

Because there are a lot of programmers who want to get into data science and I feel like a lot of the things that are taught in classical books are not that relevant, for example you don’t need to know how to implement the stuff, it’s more important on how to use it. There are important things that are not usually taught in a machine learning class or book that are really important in practice.

In your book, do you mainly use examples and walkthroughs to teach?

Actually it’s more about principles and methods. I try to explain what the different methods are and how they work but in a more intuitive way than in an in-depth math way.

Where does the book end? What are the next steps for someone when he’s finished the book?

It depends a lot on what you want to do. If you’re into statistics you could go down that route and if you’re more interested in machine learning methods, you could go into deep learning, which something I talk a little bit about.

How do you view the current state of scikit-learn and its role in machine learning?

I think in the future we’re going to work more on usability and we’re of course going to work on speed and adding methods. But I think, one thing that is important in particular is there are more new data scientists coming on and using the library. I think it’s important to have more usability features and more integration with visualization tools.

We also want to make it easier for people to add stuff and functionality to stuff that’s in sklearn already. There’s a sklearn contrib project that people have submitted new tools and methods to.

With the skyrocketing popularity of and increasing shift towards deep learning, how does that affect machine learning?

Well obvious deep learning is better for photo and video data and text to some degree, but for a lot of a lot applications the datasets are not big enough to use deep learning. I think that we’ll need both tools, both the deep learning stuff and the more traditional machine learning tools.