Classification and Clustering Algorithms

Classification and C...

A famous dialogue you could listen from the data science people. It could be true if we add it’s so challenging at the end of the dialogue. The foremost challenge starts from  categorising the problem itself. The first level of categorising could be whether supervised or unsupervised learning. The next level is what kind of algorithms to get […]

Practical Naive Bayes — Classification of Amazon Reviews

Practical Naive Baye...

If you search around the internet looking for applying Naive Bayes classification on text, you’ll find a ton of articles that talk about the intuition behind the algorithm, maybe some slides from a lecture about the math and some notation behind it, and a bunch of articles I’m not going to link here that pretty much just […]

2 Ways to Implement Multinomial Logistic Regression in Python

2 Ways to Implement ...

Logistic regression is one of the most popular supervised classification algorithm. This classification algorithm mostly used for solving binary classification problems. People follow the myth that logistic regression is only useful for the binary classification problems. Which is not true. Logistic regression algorithm can also use to solve the multi-classification problems. So in this article, your are going to […]

A Gentle Introduction to Recommender Systems with Implicit Feedback

A Gentle Introductio...

Recommender systems have become a very important part of the retail, social networking, and entertainment industries. From providing advice on songs for you to try, suggesting books for you to read, or finding clothes to buy, recommender systems have greatly improved the ability of customers to make choices more easily. Why is it so important […]

Introduction to Evaluating Classification Models

Introduction to Eval...

In this post we will describe how to evaluate a predictive model. Why bother creating complex predictive models if 5% of the customers will churn anyway? Because a predictive model will rank our clients based on the probability that they  will abandon the company. It helps answer these two questions: 1. How should we optimise our resources? 2.  What […]

How to visualize decision trees in Python

How to visualize dec...

Decision tree classifier is the most popularly used supervised learning algorithm. Unlike other classification algorithms, decision tree classifier in not a black box in the modeling phase.  What that’s means, we can visualize the trained decision tree to understand how the decision tree gonna work for the give input features. So in this article, you […]

Streaming Video Analysis in Python

Streaming Video Anal...

This was originally posted on the Silicon Valley Data Science blog by authors Matthew Rubashkin Data Engineer at SVDS, and Colin Higgins, Data Scientist at Vevo. At SVDS we have analyzed Caltrain delays in an effort to use real time, publicly available data to improve Caltrain arrival predictions. However, the station-arrival time data from Caltrain was not […]

On Building a “Fake News” Classification Model *update

On Building a “...

“A lie gets halfway around the world before the truth has a chance to get its pants on.” – Winston Churchill Since the 2016 presidential election, one topic dominating political discourse is the issue of “Fake News”. A number of political pundits claim that the rise of  significantly biased and/or untrue news influenced the election, though a study by researchers […]

Breaking Linear Classifiers on ImageNet

Breaking Linear Clas...

You’ve probably heard that Convolutional Networks work very well in practice and across a wide range of visual recognition problems. You may have also read articles and papers that claim to reach a near “human-level performance”. There are all kinds of caveats to that (e.g. see my G+ post on Human Accuracy is not a […]