Predicting Resignation in the Military

Predicting Resignati...

In the 2015 hackathon organized by Singapore’s Ministry of Defense, one of the tasks was to predict resignation rates in the military, using anonymized data on 23,000 personnel which included their age, military rank, years in service, as well as performance indicators such as salary increments and promotions. Our team won overall 3rd place. In this […]

Prophet is Data Science not Statistics, and there is a Difference

Prophet is Data Scie...

Facebook’s prophet forecasting tool illustrates the distinction between a traditional statistical approach compared to the newer machine learning/data science paradigm. This distinction is cultural: it seems that the motivation behind prophet was to quickly make accurate forecasts (predictions), instead of getting bogged down in building models satisfying certain theoretical properties, which may or may not yield useful results. The […]

Introduction to Evaluating Classification Models

Introduction to Eval...

In this post we will describe how to evaluate a predictive model. Why bother creating complex predictive models if 5% of the customers will churn anyway? Because a predictive model will rank our clients based on the probability that they  will abandon the company. It helps answer these two questions: 1. How should we optimise our resources? 2.  What […]

What Is Predictive Analytics (and Why Do You Need It)?

What Is Predictive A...

Try this statistic on for size: The 500 petabytes of digital healthcare data that existed in 2012 is predicted to reach 25,000 petabytes by the year 2020. That’s an increase of nearly 50 times the amount of data from just eight years prior! Healthcare marketers may be swimming in data, but what’s important is to […]

Amazon will make $41B this Holiday Season! Forecasting Quarterly Revenue

Amazon will make $41...

The holiday shopping season is in full swing! The economy is relatively strong compared to a few years back and so retail sales are probably going to be strong especially for amazon. Other retailers like Target and Wal-Mart are also running amazing black Friday and holiday sales to attract customers. However, amazon has consistently shown […]

Ad Hoc Distributed Random Forests #4

Ad Hoc Distributed R...

when arrays and dataframes aren’t flexible enough TL;DR. Dask.distributed lets you submit individual tasks to the cluster. We use this ability combined with Scikit Learn to train and run a distributed random forest on distributed tabular NYC Taxi data. Our machine learning model does not perform well, but we do learn how to execute ad-hoc […]

Predictive Modeling, Supervised Machine Learning, and Pattern Classification

Predictive Modeling,...

When I was working on my next pattern classification application, I realized that it might be worthwhile to take a step back and look at the big picture of pattern classification in order to put my previous topics into context and to provide and introduction for the future topics that are going to follow. Pattern […]

Predictive analytics is not enough

Predictive analytics...

The idea of predictive analytics can seem like magic: how, really, can a computer predict the future? Yet we’ve seen a lot of success based on this advanced technology in recent years, from Netflix to Amazon, Google, and more. These companies mine a massive amount of data every day for patterns, and it drives massive […]

Deutsch Credit Future Telling: part 2

Deutsch Credit Futur...

To continue on this first path, it’s logical to proceed with hyperparameter tuning on the three algorithms previously mentioned in part 1. Here the Random Forest Classifier (R.F.C) pulls ahead with 77% accuracy while the other two are still around 75%. Where there were three on this road, there is now one. The next step […]