Introduction to Evaluating Classification Models

Introduction to Evaluating Classification Models

In this post we will describe how to evaluate a predictive model. Why bother creating complex predictive models if 5% of the customers will churn anyway? Because a predictive model will rank our clients based on the probability that they  will abandon the company. It helps answer these two questions: 1. How should we optimise our resources? 2.  What […]

Water World

Water World

I live in Utah, an extremely dry state. Like much of the western United States, Utah is experiencing water stress from increasing demand, episodes of drought, and conflict over water rights. At the same time, Utahns use a lot of water per capita compared to residents of other states. According to the United States Geological […]

A survey of cross-lingual embedding models

A survey of cross-lingual embedding models

In past blog posts, we discussed different models, objective functions, and hyperparameter choices that allow us to learn accurate word embeddings. However, these models are generally restricted to capture representations of words in the language they were trained on. The availability of resources, training data, and benchmarks in English leads to a disproportionate focus on […]

ODSC East Interviews: Barton Poulson

ODSC East Interviews: Barton Poulson

The following Q&A is part of a series of interviews conducted with speakers at the 2017 ODSC East conference in Boston. The interview has been condensed and edited for clarity. This interview is with Barton Poulson, Founder at Datalab, whose talk was entitled “Data Science for the 99%”. What does Data Science for the 99% mean? Most of […]

Julia 0.5 Highlights

Julia 0.5 Highlights

To follow along with the examples in this blog post and run them live, you can go to JuliaBox, create a free login, and open the “Julia 0.5 Highlights” notebook under “What’s New in 0.5”. The notebook can also be downloaded from here. Julia 0.5 is a pivotal release. It introduces more transformative features than […]

Playing with Randall Munroe’s XKCD handwriting

Playing with Randall Munroe’s XKCD handwriting

The XKCD font (as used by matplotlib et al.) got an update to include lower-case characters. For some time now I have been aware of a handwriting sample produced by Randall Munroe (XKCD’s creator) that I was interested in exploring. The ultimate aim is to automatically produce a font-file using open source tools, and to […]

Python as a way of thinking

Python as a way of thinking

This article contains supporting material for this blog post at Scientific American.  The thesis of the post is that modern programming languages (like Python) are qualitatively different from the first generation (like FORTRAN and C), in ways that make them effective tools for teaching, learning, exploring, and thinking. I presented a longer version of this argument […]

Cognitive Machine Learning: Prologue

Cognitive Machine Learning: Prologue

Sources of inspiration is one thing we do not lack in machine learning. This is what, for me at least, makes  machine learning research such a rewarding and exciting area to work in. We gain inspiration from our traditional neighbors in statistics, signal processing and control engineering, information theory and statistical physics. But our fortune continues, and we […]

Statistics, Simians, the Scottish, and Sizing up Soothsayers

Statistics, Simians, the Scottish, and Sizing up Soothsayers

A predictive model can be a parametrized mathematical formula, or a complex deep learning network, but it can also be a talkative cab driver or a slides-wielding consultant. From a mathematical point of view, they are all trying to do the same thing, to predict what’s going to happen, so they can all be evaluated […]