Introduction to Evaluating Classification Models

Introduction to Evaluating Classification Models

In this post we will describe how to evaluate a predictive model. Why bother creating complex predictive models if 5% of the customers will churn anyway? Because a predictive model will rank our clients based on the probability that they  will abandon the company. It helps answer these two questions: 1. How should we optimise our resources? 2. ...

Water World

Water World

I live in Utah, an extremely dry state. Like much of the western United States, Utah is experiencing water stress from increasing demand, episodes of drought, and conflict over water rights. At the same time, Utahns use a lot of water per capita compared to residents of other states. According to the United States Geological Survey, in 2014 people ...

A survey of cross-lingual embedding models

A survey of cross-lingual embedding models

In past blog posts, we discussed different models, objective functions, and hyperparameter choices that allow us to learn accurate word embeddings. However, these models are generally restricted to capture representations of words in the language they were trained on. The availability of resources, training data, and benchmarks in English leads to ...

ODSC East Interviews: Barton Poulson

ODSC East Interviews: Barton Poulson

The following Q&A is part of a series of interviews conducted with speakers at the 2017 ODSC East conference in Boston. The interview has been condensed and edited for clarity. This interview is with Barton Poulson, Founder at Datalab, whose talk was entitled "Data Science for the 99%". What does Data Science for the 99% mean? Most of ...

Julia 0.5 Highlights

Julia 0.5 Highlights

To follow along with the examples in this blog post and run them live, you can go to JuliaBox, create a free login, and open the “Julia 0.5 Highlights” notebook under “What’s New in 0.5”. The notebook can also be downloaded from here. Julia 0.5 is a pivotal release. It introduces more transformative features than any release since the first ...

Playing with Randall Munroe’s XKCD handwriting

Playing with Randall Munroe’s XKCD handwriting

The XKCD font (as used by matplotlib et al.) got an update to include lower-case characters. For some time now I have been aware of a handwriting sample produced by Randall Munroe (XKCD's creator) that I was interested in exploring. The ultimate aim is to automatically produce a font-file using open source tools, and to learn a few things along ...

Python as a way of thinking

Python as a way of thinking

This article contains supporting material for this blog post at Scientific American.  The thesis of the post is that modern programming languages (like Python) are qualitatively different from the first generation (like FORTRAN and C), in ways that make them effective tools for teaching, learning, exploring, and thinking. I presented a longer ...

Cognitive Machine Learning: Prologue

Cognitive Machine Learning: Prologue

Sources of inspiration is one thing we do not lack in machine learning. This is what, for me at least, makes  machine learning research such a rewarding and exciting area to work in. We gain inspiration from our traditional neighbors in statistics, signal processing and control engineering, information theory and statistical physics. But our ...

Statistics, Simians, the Scottish, and Sizing up Soothsayers

Statistics, Simians, the Scottish, and Sizing up Soothsayers

A predictive model can be a parametrized mathematical formula, or a complex deep learning network, but it can also be a talkative cab driver or a slides-wielding consultant. From a mathematical point of view, they are all trying to do the same thing, to predict what's going to happen, so they can all be evaluated in the same way. Let's look at ...