Visual Analytics of Instagram’s #gopro hashtag with AI
Images have become a very common medium of human expression on the internet with the coming up of social networks. Facebook is the biggest repository of digital images ever. This trend is only going to intensify given the emergence of image first platforms like Instagram and Snapchat, also called... Read more
The Last 5 Years In Deep Learning
Introduction As we’re nearing the end of 2017 (and coincidentally the first day of NIPS 2017), we’ve come to the 5 year landmark of deep learning really starting to hit the mainstream. For me, I think of AlexNet and the 2012 Imagenet competition as the coming out party (although researchers have definitely been working... Read more
Some things I’d like you to know about Data Science
Things I’ve learned mostly by making mistakes Masses of data + cutting edge machine learning + cheap compute = Profit. Right? It’s not that simple. Data science isn’t a replacement for asking difficult questions and doing hard work based on the answers. In fact, it’s quite the opposite. Enabled by... Read more
CatBoost: Yandex’s machine learning algorithm is available free of charge
Russia’s Internet giant Yandex has launched CatBoost, an open source machine learning service. The algorithm has already been integrated by the European Organization for Nuclear Research to analyze data from the Large Hadron Collider, the world’s most sophisticated experimental facility. Machine learning helps make decisions by analyzing data and... Read more
A conversation with Thomas Wiecki on the use of probabilistic programming and machine learning in quant finance.
Not surprisingly… hedge funds and especially quant funds are notorious for being secretive about the algorithms they employ to beat the market. A Boston based startup is taking a different approach. Thomas Wiecki is the Head of Research at Quantopian, which hosts an open source platform that allows anyone... Read more
You weren’t supposed to actually implement it, Google
Last month, I wrote a blog post warning about how, if you follow popular trends in NLP, you can easily accidentally make a classifier that is pretty racist. To demonstrate this, I included the very simple code, as a “cautionary tutorial.” The post got a fair amount of reaction. Much... Read more
A decade of using text-mining for citation function classification
Academic work is typically filled with references to previous work. Unfortunately, most of these references have, at best, a tangential relevance. Thus you cannot trust that a paper that cites another actually “builds on it”. A more likely scenario is that the authors of the latest paper did not... Read more
Big aggregate queries can still violate privacy
Suppose you want to prevent your data science team from being able to find out information on individual customers, but you do want them to be able to get overall statistics. So you implement two policies. Data scientists can only query aggregate statistics, such as counts and averages. These... Read more
Linked Data and Data Science
Understanding Gender Roles in Movies with Text Mining
I have a new visual essay up at The Pudding, using text mining to explore how women are portrayed in film.   In April 2016, we broke down film dialogue by gender. The essay presented an imbalance in which men delivered more lines than women across 2,000 screenplays. But quantity of lines... Read more