The normalcy of online learning: the more you study, the better you do
Online learning, after all, is just a form of learning: time spent studying is one of the best predictors of success. Both the pattern (and the exceptions) can be seen quite clearly on the Open University Learning Analytics dataset, which collects anonymized data about the personal characteristics and,... Read more
What’s the difference between data science, machine learning, and artificial intelligence?
When I introduce myself as a data scientist, I often get questions like “What’s the difference between that and machine learning?” or “Does that mean you work on artificial intelligence?” I’ve responded enough times that my answer easily qualifies for my “rule of three”: When you’ve... Read more
Ethics for powerful algorithms (2 of 4)
This is the second of four articles on the ethics of powerful algorithms, taking COMPAS as a case study. Our story so far: COMPAS is an algorithm used widely today to predict which criminals are most likely to commit future crimes. Investigative journalists at ProPublica recently published a... Read more
My big obsession of 2018 so far is sports prediction platform Throne AI.  There’s no better way to describe than Kaggle for sports. The platform provides users with data with which they use to build models to predict the outcome of sports matches. Each league on... Read more
Plotting author statistics for Git repos using Git of Theseus
I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written... Read more
Deep(ly) Unsettling: The ubiquitous, unspoken business model of AI-induced mental illness
“The junk merchant,” wrote William S. Burroughs, “doesn’t sell his product to the consumer, he sells the consumer to his product. He does not improve and simplify his merchandise. He degrades and simplifies the client.” He might as well have been describing the commercial, AI-mediated, social-network-driven... Read more
Optimization for Deep Learning Highlights in 2017
Table of contents: Improving Adam Decoupling weight decay Fixing the exponential moving average Tuning the learning rate Warm restarts SGD with restarts Snapshot ensembles Adam with restarts Learning to optimize Understanding generalization Deep Learning ultimately is about finding a minimum that generalizes well — with bonus... Read more
Happy, Healthy, Hungry. Mapping San Francisco Restaurant Cleanliness
Somewhat recently, Yelp announced that it is partnering with Code for America and the City of San Francisco to develop LIVES, an open data standard which allows municipalities to publish restaurant inspection data in a standardized format. This is a step towards allows a much much... Read more
R, as I’ve pointed out before, has a package discovery problem. There’s a new package, colorblindr, which lets you see the impact of various sorts of colour-blindness on a colour palette, a very useful thing for designing good graphics. When it’s mentioned on Twitter, you see lots... Read more
Tendencies of Data Engineers and Scientists
A long time ago I wrote a short post on the differences between data engineers and data scientists. My reasoning back then was that a data engineer is someone who applies engineering methodologies to data problems, while a data scientist is someone who applies the scientific method... Read more