Introduction to Evaluating Classification Models

Introduction to Eval...

In this post we will describe how to evaluate a predictive model. Why bother creating complex predictive models if 5% of the customers will churn anyway? Because a predictive model will rank our clients based on the probability that they  will abandon the company. It helps answer these two questions: 1. How should we optimise our resources? 2.  What […]

How to visualize decision trees in Python

How to visualize dec...

Decision tree classifier is the most popularly used supervised learning algorithm. Unlike other classification algorithms, decision tree classifier in not a black box in the modeling phase.  What that’s means, we can visualize the trained decision tree to understand how the decision tree gonna work for the give input features. So in this article, you […]

Streaming Video Analysis in Python

Streaming Video Anal...

This was originally posted on the Silicon Valley Data Science blog by authors Matthew Rubashkin Data Engineer at SVDS, and Colin Higgins, Data Scientist at Vevo. At SVDS we have analyzed Caltrain delays in an effort to use real time, publicly available data to improve Caltrain arrival predictions. However, the station-arrival time data from Caltrain was not […]

On Building a “Fake News” Classification Model *update

On Building a “...

“A lie gets halfway around the world before the truth has a chance to get its pants on.” – Winston Churchill Since the 2016 presidential election, one topic dominating political discourse is the issue of “Fake News”. A number of political pundits claim that the rise of  significantly biased and/or untrue news influenced the election, though a study by researchers […]

Breaking Linear Classifiers on ImageNet

Breaking Linear Clas...

You’ve probably heard that Convolutional Networks work very well in practice and across a wide range of visual recognition problems. You may have also read articles and papers that claim to reach a near “human-level performance”. There are all kinds of caveats to that (e.g. see my G+ post on Human Accuracy is not a […]

Implementing a CNN for Text Classification in Tensorflow

Implementing a CNN f...

The full code is available on Github. In this post we will implement a model similar to Kim Yoon’s Convolutional Neural Networks for Sentence Classification. The model presented in the paper achieves good classification performance across a range of text classification tasks (like Sentiment Analysis) and has since become a standard baseline for new text […]

Over-Optimising: A Story about Kaggle

Over-Optimising: A S...

I recently took a stab at a Kaggle competition. The premise was simple, given some information about insurance quotes, predict whether or not the customer who requested the quote will follow through and buy the insurance. Straight forward classification problem, data already clean and in one place, clear scoring metric (Area under the ROC curve). […]

Deutsch Credit Future Telling: part 2

Deutsch Credit Futur...

To continue on this first path, it’s logical to proceed with hyperparameter tuning on the three algorithms previously mentioned in part 1. Here the Random Forest Classifier (R.F.C) pulls ahead with 77% accuracy while the other two are still around 75%. Where there were three on this road, there is now one. The next step […]

Deutsch Credit Future Telling: part 1

Deutsch Credit Futur...

Classification tasks in Data Science come frequently, but the hardest are those with unbalanced classes. From biology to finance, the real-life situations are numerous. Before balancing your errors, establishing a baseline with the most frequent occurrence can give you over 90% accuracy right off the bat.  The question of whether it is worse to have […]