Save 45% off ODSC East, it's just a few months away!




for an extra 20% off, use the code: ODSC20
Distinguishing between Statistical Modeling and Machine Learning

Distinguishing betwe...

Editor’s note: This article will serve as a great overview. After reading it, we recommend listening the the podcast at the bottom, it may just broaden your understanding. If you are looking for it, here is one framework to distinguish statistical modeling from machine learning, and it is based on the desire for interpretability. In summary, if you […]

Vector Models in Machine learning Part 2

Vector Models in Mac...

This is a blog post rewritten from a presentation at NYC Machine Learning on Sep 17. It covers a library called Annoy that I have built that helps you do nearest neighbor queries in high dimensional spaces. In the first part, I went through some examples of why vector models are useful. In the second […]

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning – Part 2

Model Evaluation, Mo...

Bootstrapping and Uncertainties: Introduction In the previous article (Part I), we introduced the general ideas behind model evaluation in supervised machine learning. We discussed the holdout method, which helps us to deal with real world limitations such as limited access to new, labeled data for model evaluation. Using the holdout method, we split our dataset […]

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning – Part 1

Model Evaluation, Mo...

 Introduction Machine learning has become a central part of our life – as consumers, customers, and hopefully as researchers and practitioners! Whether we are applying predictive modeling techniques to our research or business problems, I believe we have one thing in common: We want to make “good” predictions! Fitting a model to our training data […]

Beyond One-Hot: An Exploration of Categorical  Variables

Beyond One-Hot: An E...

In machine learning, data is king. The algorithms and models used to make predictions with the data are important, and very interesting, but ML is still subject to the idea of garbage-in-garbage-out. With that in mind, let’s look at a little subset of those input data: categorical variables. Categorical variables (wiki) are those that represent a […]

An Intuitive Explanation of Convolutional Neural Networks

An Intuitive Explana...

Convolutional Neural Networks (ConvNets or CNNs) are a category of Neural Networks that have proven very effective in areas such as image recognition and classification. ConvNets have been successful in identifying faces, objects and traffic signs apart from powering vision in robots and self driving cars. Figure 1: Source [1] In Figure 1 above, a ConvNet is able […]

Interacting with ML Models

Interacting with ML ...

The main difference between data analysis today, compared with a decade or two ago, is the way that we interact with it. Previously, the role of statistics was primarily to extend our mental models by discovering new correlations and causal rules. Today, we increasingly delegate parts of our reasoning processes to algorithmic models that live […]

Dataset Shift in Machine Learning

Dataset Shift in Mac...

Introduction: DataRobot’s Peter Prettenhofer gave an engrossing talk at the recent ODSC UK conference on the problem of dataset shift in Machine Learning. His introduction consisted of a brief touch on the mathematics of supervised learning and an outline of dataset shift. An interactive illustration served as a wonderful visual display of the problem. Mr. […]

When Machine Learning Matters

When Machine Learnin...

I joined Spotify in 2008 to focus on machine learning and music recommendations. It’s easy to forget, but Spotify’s key differentiator back then was the low-latency playback. People would say that it felt like they had the music on their own hard drive. (The other key differentiator was licensing — until early 2009 Spotify basically […]