This is the first post of a series of three articles in which we will discuss tips and guidelines for successful data science implementations. This post goes over the things you should worry about before to write the first line of code. A high level data... Read more
On Machine Learning and Programming Languages
This article was co-written by Mike Innes (Julia Computing), David Barber (UCL), Tim Besard (UGent), James Bradbury (Salesforce Research), Valentin Churavy (MIT), Simon Danisch (MIT), Alan Edelman (MIT), Stefan Karpinski (Julia Computing), Jon Malmaud (MIT), Jarrett Revels (MIT), Viral Shah (Julia Computing), Pontus Stenetorp (UCL) and... Read more
How To Do User Segmentation Right – A Practical Guide for Data Analysts
What You Will Learn: The different ways to segment users Walk through a real-world segmentation exercise There are different ways to segment users – marketers and user researchers typically use the interview or survey methods and segment users based on attitudes or intentions. By contrast, data... Read more
In a previous post, we demonstrated how to use the Python3 library Newspaper to painlessly extract data from news articles. Using Newspaper, I was able to extract text from over a 1000 articles about topics including, but limited to Data Science, Artificial Intelligence, and Big Data.... Read more
Visual Analytics of Instagram’s #gopro hashtag with AI
Images have become a very common medium of human expression on the internet with the coming up of social networks. Facebook is the biggest repository of digital images ever. This trend is only going to intensify given the emergence of image first platforms like Instagram and... Read more
Some things I’d like you to know about Data Science
Things I’ve learned mostly by making mistakes Masses of data + cutting edge machine learning + cheap compute = Profit. Right? It’s not that simple. Data science isn’t a replacement for asking difficult questions and doing hard work based on the answers. In fact, it’s quite the... Read more
Big aggregate queries can still violate privacy
Suppose you want to prevent your data science team from being able to find out information on individual customers, but you do want them to be able to get overall statistics. So you implement two policies. Data scientists can only query aggregate statistics, such as counts... Read more
Linked Data and Data Science
The capacity to connect any data source in the world is in our hands today, it’s what’s known as the Semantic Web or the Web of Data. The Internet had a clear orientation to be human-readable when it was invented, now we need it to be... Read more
Why conversion matters — a toy model
There are often close relationships between top level business metrics. For instance, it’s well known that retention has a super strong impact on the valuation of a subscription business. Or that the % of occupied seats is super important for an airline. A fun little toy... Read more
Don’t start with a hammer. Find the nail first, then grab the right tool. Or, why technology alone is not enough. From a Fireside Chat at Galvanize SF, July 5 2017, with: Mark Meloon (Caltech, startups, teaches data science) Sean Gerrish (Princeton, Google) Yunkai Zhou (Tsinghua, Google) See below for... Read more