What Is Predictive Analytics (and Why Do You Need It)?

Try this statistic on for size: The 500 petabytes of digital healthcare data that existed in 2012 is predicted to reach 25,000 petabytes by the year 2020. That’s an increase of nearly 50 times the amount of data from just eight years prior! Healthcare marketers may be swimming in data, but what’s important is to […]

Amazon will make $41B this Holiday Season! Forecasting Quarterly Revenue

The holiday shopping season is in full swing! The economy is relatively strong compared to a few years back and so retail sales are probably going to be strong especially for amazon. Other retailers like Target and Wal-Mart are also running amazing black Friday and holiday sales to attract customers. However, amazon has consistently shown […]

Ad Hoc Distributed Random Forests #4

when arrays and dataframes aren’t flexible enough TL;DR. Dask.distributed lets you submit individual tasks to the cluster. We use this ability combined with Scikit Learn to train and run a distributed random forest on distributed tabular NYC Taxi data. Our machine learning model does not perform well, but we do learn how to execute ad-hoc […]

Predictive Modeling, Supervised Machine Learning, and Pattern Classification

When I was working on my next pattern classification application, I realized that it might be worthwhile to take a step back and look at the big picture of pattern classification in order to put my previous topics into context and to provide and introduction for the future topics that are going to follow. Pattern […]

Predictive analytics is not enough

The idea of predictive analytics can seem like magic: how, really, can a computer predict the future? Yet we’ve seen a lot of success based on this advanced technology in recent years, from Netflix to Amazon, Google, and more. These companies mine a massive amount of data every day for patterns, and it drives massive […]

Deutsch Credit Future Telling: part 2

To continue on this first path, it’s logical to proceed with hyperparameter tuning on the three algorithms previously mentioned in part 1. Here the Random Forest Classifier (R.F.C) pulls ahead with 77% accuracy while the other two are still around 75%. Where there were three on this road, there is now one. The next step […]

Deutsch Credit Future Telling: part 1

Classification tasks in Data Science come frequently, but the hardest are those with unbalanced classes. From biology to finance, the real-life situations are numerous. Before balancing your errors, establishing a baseline with the most frequent occurrence can give you over 90% accuracy right off the bat.  The question of whether it is worse to have […]

Win Customer Loyalty with Predictive Analytics

Winning your customer for life is a challenging task for organizations. How can you connect with your customer and how can you ensure that they stay with your organization for a long time? Questions that many organizations face.  Fortunately, with the advance of big data and analytics, it has become a little bit easier for […]

Prediction Machine Designed with Spark, Kudu, and Impala

This was originally posted on the Silicon Valley Data Science blog. Why should your infrastructure maintain a linear growth pattern when your business scales up and down during the day based on natural human cycles? There is an obvious need to maintain a steady baseline infrastructure to keep the lights on for your business, but […]