Every week we bring you a selection of the best data science articles we find in Cyberspace. We start with high...

Every week we bring you a selection of the best data science articles we find in Cyberspace.

We start with high school students writings on AI, lessons learned by one of the leading Machine Learning expert, building bots without programming, an intro into probabilistic programming and take a look at some unexpected applications or Neural Networks.

A year of artificial intelligence

Rohan Kapur @MCKapur and Lenny Khazan @LennyKhazan are high school students from Singapore and NYC. In 2015 they released their Contra app, a platform for millennials, on Apple’s App store and in 2016 they decided to take on … Artificial Intelligence. The result of their ambitious endeavour is a series of extremely well written articles on Deep Learning published on Medium.

For instance:


In Autoencoders and Word Embeddings, Lenny’s second article, you’ll find a clear presentation of the various types of Autoencoders (Denoising, Variational, Sequence-to-sequence), how they are used for dimensionality reduction and how they are applied for word embeddings in models like word2vec.

In The vanishing gradient problem, Rohan traces the problem of slow convergence of the first layer of forward networks to the nature of the sigmoid activation function and introduces well known alternatives such as the Rectified Linear Unit (ReLU) activation function. Great analysis.

What’s make these articles different from others you’d find on Deep Learning, is that these 2 gifted authors share their own questions, understandings and representations of the concepts and inner workings of Deep Learning. It’s not so much about the theory or the implementation it’s about the why and how … and again the why.

The whole series, 8 articles so far is available on Medium A year of Artificial Intelligence..

Ten Lessons Learned from Building real-life impactful Machine Learning Systems

Xavier Amatriain, former Director of Netflix Recommendation Algorithms is now VP of Engineering at Quora. Although he is now mostly active on Quora, his former blog TechnoCalifornia is a treasure of high quality data science articles.

One of his latest article is a reflexion on the lessons learned in building real data science applications during his years at Netflix. Xavier shares his years of experience on how Machine Learning models interact with systems, data, and users in order to obtain a really valuable impact. He addresses issues such as the volume of data, complexity of models and model optimization, presentation biais, users and UI, metrics choice, … More than great advices, these are lessons on issues every data scientist should be familiar with when faced with launching data projects into production.

Resume Bots


Esther Crawford @EstherCrawford a Product Marketer took on the chatbot frenzy and created a chatbot around her personal resume. The idea being to go beyond the LinkedIn profile or standard resume format to help recruiters and applicants alike get a better understanding of a person’s experience, their context and the cultural fit between the applicant and the company.

The result named EstherBot is a personal resume bot that can tell you about Esther’s career, interests, and values.

What’s interesting is that Esther was able to build a chatbot solely based on existing platforms. After trying several of these platforms she settled on Facebook Messenger, Telegram and SMS as means of interactions with her bot and used Smooch.io to build it. In a follow up article she shares her experience and unexpected findings. Now you too can build your own personal bot!

Anomaly Detection & Probabilistic Programming

Probabilistic Programming is a new approach to machine learning based on Bayesian interference with the promise of being more interpretable than blackbox algorithms. Bayesian models also enable a more action focused approach by offering complete probability distributions and by consequence meaningful confidence intervals as results. Probabilistic programming languages built with bayesian algorithms in their core bring the power of programming ot Bayesian Inference. Very powerful tools!

In this article and associated Jupyter Notebook, the team at Fast Forward Labs, a machine intelligence research company in NYC, compares standard methods in anomaly detection and shows how probabilistic programming can provide an easy way to formulate more robust anomaly detection models.

New and unexpected applications of Deep Learning

Last week we wrote about @stitchfix designs new clothes with Autoencoders. Well this week we found a flurry of new applications of Deep Learning

  • Pavel Gonchar uses TensorFlow to colorize grayscale images. He shares his approach and code on github with pretty impressive results.
  • Deep Dreams In June 2015 Google released DeepDream a collection of IPython notebooks on the internal representation of neural networks resulting in fascinating psychedelic images generated by the deep layers in Neural Networks. You’ll find all the resources you need to know to make your own deep dreams images in this reddit article.
  • Sketch Simplification Rough sketches allow artists to quickly render their ideas on paper. These sketches need to be cleaned up into simplified drawings ready for production. Researchers from the Waseda University in Japan use convolutional neural network to simplify rough sketches into a version which is then ready for vectorization

simplify drawing

Auto-Generating Clickbait With Recurrent Neural Networks A lot of headlines on the web these days are written to excite your curiosity and boost click rates. Lars Eidnes uses Recurrent Neural Networks to generate these types of clickbait titles with surprisingly good results.

To read more from Alex sign up for our newsletter or follow him on twitter @alexip.

Alex Perrier

Alex Perrier

Lead Data Scientist focused on Natural Language Processing and Predictive Modeling, a background in stochastic processes and signal processing and extensive experience in agile software development. I recently authored a book on AWS Machine Learning with Packt Pub. I am a creative start-up co-founder with clear communication skills, project management and business development experience. Team lead and team builder.