In business, the value of being able to accurately predict outcomes – asset failures, market fluctuations, or customer churn, for example – can scarcely be understated. The growth of data analytics in business in recent years is largely attributable to growing demand for predictive modeling. Today, artificial intelligence is making it possible to automate much of the predictive modeling done by businesses through a process aptly named automated model building (AMB).
How AMB works
Much of the AMB process closely resembles the traditional analytics workflow. Data goes through a cleaning process, whereby the variables are formatted and any missing values are filled in. Next, the data is labeled such that the modeler’s algorithms can understand it.
What makes AMB different from traditional analytics is its capability to rapidly and continuously process the data to “learn” which information is most important and use it to build optimal machine learning models. Whereas data scientists may take weeks or months to clean data, build, and tune models, an effective automated model builder can often do the same work in hours. That’s because data scientists often must try to intuit which kind of machine learning model will work best to predict a business outcome, whereas the builder can simply test hundreds or thousands of models at light speed to find the one that most successfully predicts the data as it’s fed in.
Slide from Sourabh Chaki’s presentation ‘Automated Machine Learning Using Spark Mllib to Improve Customer Experience’ at SparkSummit 2015
An AMB use case
Consider one use case: AMB for “predictive maintenance” on industrial assets. Most people won’t be surprised to know that expensive industrial machinery – turbines, engines, wells and so forth – is extremely costly to maintain. What may surprise you, however, is that many operators of such equipment rely on maintenance practices that have remained unchanged for decades. When a machine stops working properly, mechanics are sent into the field or factory floor to try to diagnose the problem and fix it. If they can’t, it could mean hours or days of downtime, entailing exorbitant opportunity costs for the operator.
A good automated model builder can turn this kind of “reactive” maintenance on its head. Using data from sensors attached to assets, it chooses the best of hundreds of automatically-built models to accurately predict the machinery’s behavior. When the machine learning algorithms detect abnormal signals in the data, the program immediately alerts the operator with information on the extent of the problem as well as its exact location in the asset. AMBs have been known to predict failures or mishaps weeks or months in advance, allowing mechanics to preempt such issues altogether.
Some leading AMBs
While the ideas behind AMB aren’t new, it’s only recently that AI developers have gained access to the kind of computational resources necessary to make it work in the context of industry. Massively parallel processing – coordinated work on a computational task among multiple processors – has enabled model building on the kind of vast scale that AMB requires. Now, several companies offer fully automated model builders for enterprise clients, each with a slightly different focus. Here are a few:
- DataRobot is a Boston-based company whose AMB platform centers around use cases in finance, insurance, and marketing. Founded in 2012, the company was among the first to bring the technology to the mainstream. Chief Scientist of DataRobot Michael Schmidt will be presenting on AMB applications in these areas at this year’s ODSC Europe conference.
- H2O is an AI company headquartered in Mountain View, California that offers what it calls “driverless” AI to enterprise clients in finance, insurance, healthcare, retail, telco, sales, and marketing. The company is unique in also offering an open source AMB platform that’s accessible through a variety of programming languages, including R and Python. H2O’s open source platform has been a part of several ODSC conferences and will be featured in workshop sessions at ODSC Europe 2018.
- SparkCognition is an Austin-based AI company that just released a new AMB product named Darwin. Darwin differs from other AMB platforms by taking a “genetic” approach to deep learning, meaning that in addition to model building it also combines (“breeds”) successful models to produce even better ones. SparkCognition’s Director of Product Management Keith Moore will be discussing the new approach at this year’s ODSC West conference.
AMB in practice: Predicting Boston home prices
As part of its recent release, SparkCognition’s Darwin is on offer to the public with a 15-day free trial, which allowed me to test the AMB for a predictive modeling case (you can also request a trial of H2O’s enterprise builder if you’re with a company).
For this test, I ran a simple supervised regression on the famous Boston Housing dataset. Essentially, I was trying to give Darwin labeled data on hundreds of homes in the Boston area (info on the home’s location, condition, and so forth), to see if it could build its own model to accurately predict the assessed prices of those homes.
For those unfamiliar with the Boston dataset, here’s what a head() call looks like.
After feeding in the .csv file I watched the program build model after model with a variety of machine learning techniques, combining the most successful ones to create new model “generations” (hence the evolutionary machine learning approach, and the name “Darwin”). A few seconds later, it informed me that the 56th generation of its models was most successful at predicting the home values. This generation was a neural network with the following feature importance:
In other words, Darwin found that these were the 10 most important factors in predicting the homes’ assessed values in Boston, weighted by the numbers on the right.
Finally, the AMB showed me the success rate of the model from generation 56:
What you’re looking at here are the predicted home values plotted against the actual assessed values, with a coefficient of determination of 0.98298. Essentially, this means the model was able to retroactively predict the home values, given the other data, with just over 98% accuracy. Not bad!
More testing would be necessary to see how well Darwin, DataRobot, H2O, and other AMBs would work in different types of business cases. The enterprising data scientist with access to all of them could test them with more complex datasets, and on unsupervised learning cases.
In a world of more and more data and increasingly sophisticated AI, AMB seems poised to gain widespread use. Time will tell what kinds of automated models will be built for business, and how they’ll generate value for their users.