On Building a “Fake News” Classification Model *update

On Building a “...

“A lie gets halfway around the world before the truth has a chance to get its pants on.” – Winston Churchill Since the 2016 presidential election, one topic dominating political discourse is the issue of “Fake News”. A number of political pundits claim that the rise of  significantly biased and/or untrue news influenced the election, though a study by researchers […]

Breaking Linear Classifiers on ImageNet

Breaking Linear Clas...

You’ve probably heard that Convolutional Networks work very well in practice and across a wide range of visual recognition problems. You may have also read articles and papers that claim to reach a near “human-level performance”. There are all kinds of caveats to that (e.g. see my G+ post on Human Accuracy is not a […]

Implementing a CNN for Text Classification in Tensorflow

Implementing a CNN f...

The full code is available on Github. In this post we will implement a model similar to Kim Yoon’s Convolutional Neural Networks for Sentence Classification. The model presented in the paper achieves good classification performance across a range of text classification tasks (like Sentiment Analysis) and has since become a standard baseline for new text […]

Over-Optimising: A Story about Kaggle

Over-Optimising: A S...

I recently took a stab at a Kaggle competition. The premise was simple, given some information about insurance quotes, predict whether or not the customer who requested the quote will follow through and buy the insurance. Straight forward classification problem, data already clean and in one place, clear scoring metric (Area under the ROC curve). […]

Deutsch Credit Future Telling: part 2

Deutsch Credit Futur...

To continue on this first path, it’s logical to proceed with hyperparameter tuning on the three algorithms previously mentioned in part 1. Here the Random Forest Classifier (R.F.C) pulls ahead with 77% accuracy while the other two are still around 75%. Where there were three on this road, there is now one. The next step […]

Deutsch Credit Future Telling: part 1

Deutsch Credit Futur...

Classification tasks in Data Science come frequently, but the hardest are those with unbalanced classes. From biology to finance, the real-life situations are numerous. Before balancing your errors, establishing a baseline with the most frequent occurrence can give you over 90% accuracy right off the bat.  The question of whether it is worse to have […]