

Darwin: Machine Learning Beyond Predefined Recipes
Guest contributorModelingMachine LearningSponsored Postposted by SparkCognition March 26, 2019 SparkCognition

While machine learning has enabled massive advancements across industries, it requires significant development and maintenance efforts from data science teams. The next evolution in human intelligence is automating the creation of machine learning models that do not follow predefined formulas, but rather adapt and evolve according to the problem’s data. The same way a tailored suit feels and looks different from generic options because it actually fits, tailored models perform differently than pre-established boxed algorithms because they are custom-fitted to your data.
To answer this need, SparkCognition has developed Darwin™, a machine learning product that automates the building and deployment of models at scale. Darwin uses a patented approach based on neuroevolution that custom builds model architectures to ensure the best fit for the problem at hand. Rather than simply choosing the best performer from a predefined list of algorithms, Darwin uses a blend of evolutionary and deep learning methods to iteratively find the most optimal model tailored to your data. This automated model building process effectively creates unique solutions that correctly and accurately generate predictions for your unique data problems.
How Does Darwin Work?
Darwin automates three major steps in the data science process: cleaning, feature generation, and the construction of either a supervised or unsupervised model. Each of these steps are performed as a single generation of Darwin’s evolutionary process, which contains dozens of model architecture candidates. At the end of each generation, Darwin keeps the best performers, analyzes their architectural characteristics, and spawns a new generation of models based on these features. This way, Darwin automatically generates thousands of models that evolve and improve with each generation to more accurately reflect the relationships in your data.
Let’s take a look at each individual step of this process:
Cleaning
First, Darwin needs to convert data sets into a usable form for algorithmic development. This includes representing categorical data as numeric and extracting features that preserve temporal relationships in date/time information. Data is also scaled to normalize data sets so features can be compared to one another.
Feature Generation
Once data has been cleaned, data scientists often manipulate that data to generate more appropriate features to solve a particular problem. One of the biggest challenges in handling dynamic time series data is determining how to window the time steps for feature generation. Darwin automates this windowing process using one-dimensional convolutional neural networks (CNN). CNNs are a class of deep neural networks that use a type of multilayer perceptions designed to need only minimal preprocessing. The network instead automatically learns the filters that traditionally would need to be engineered by hand.
Feature Selection and Model Building
Once automated cleaning and feature generation have taken place, the data set is ready to be used to build a model. Through neuroevolution, Darwin is capable of building the following types of learning models: supervised learning and normal behaviour modeling. These methods differ in how they work and the problems they solve.
Supervised Learning Models
For supervised learning problems, the goal of Darwin is to ingest the cleaned input data and automatically produce a highly optimized machine learning model which can accurately predict a target of interest specified by the user. Darwin accomplishes this using a patented evolutionary algorithm which simultaneously optimizes and compares various machine learning methodologies, most heavily favoring deep neural networks.
Darwin begins by analyzing the characteristics of the input dataset and the specified problem, and then applying past knowledge to construct an initial population of machine learning models which are likely to produce accurate predictions on the problem. Then, traits from the best-performing models are combined to yield even better models over many generations. This ensures a final model that is highly optimized to the specified problem.
Normal Behavior Modeling
In the same way that Darwin uses an evolutionary algorithm to solve supervised problems, it is also capable of identifying relationships in data that drift over time using a technique called normal behavioral modeling. Darwin does normal behavior modeling through an autoencoder, which is a neural network-based approach that performs dimensionality reduction. Autoencoders compress data to reduce the feature set to the smallest size possible, and then decompress it with as little as loss possible.
Like any other neural network, autoencoders have numerous hidden layers, a defined latent space, and different activation functions in their encoding/decoding process. Darwin automates the creation of this network topology, and then performs backpropagation with dropout to reduce the output loss via weight optimization. When deployed in production, the model’s ability to reconstruct data over time helps to identify shifting relationships in data.
Darwin uses this approach to build models that go beyond a traditional “risk index” and can identify anomalous operations and systems failures.
How good is this process? Read Darwin’s Efficacy Report to learn more.
Use Cases for Darwin
Darwin’s technology can be applied to solve a wide range of problems, such as quality prediction on manufacturing processes, inventory optimization for maintenance operations, automated support ticket triage, and fraud detection. The following use case explores the use of Darwin to generate customer churn predictions in the telecommunications industry.
Predicting Customer Churn for a Telecommunications Operator
The Problem
Improving customer retention rates is key to remaining competitive and profitable in the telecommunications industry. However, understanding the impact of churn rates across different areas of the organization is just one part of the equation. A more difficult challenge is to move beyond traditional statistics-based solutions to increase the accuracy of churn predictions and widen the window of opportunity to implement corrective business strategies. This requires dynamic, highly accurate machine learning solutions that directly act on customer data to make predictions, recommend actions, and measure their impact.
The Solution
A major telecommunications operator made use of Darwin to build models for a machine learning-based customer churn solution. Using data such as account length, area code, international minutes, long-distance minutes, and more, Darwin’s models were able to accurately predict churn and provide customized AI-generated deals for individual customers.
Darwin’s custom-built machine learning churn models adapt to changes in the data to continuously make accurate predictions. This is achievable thanks to Darwin’s patented model building approach, using neuroevolution to directly act on data and build the optimal model for the problem at hand.
Results
Darwin’s initial models for the telecommunications operator had a cross entropy loss of 0.19 and roughly 95% accuracy. These models allowed for proactive detection of churn events based on real-time tracking of customer data, preferences, and usage patterns.
Darwin can also be used to generate models in the decision engine of the application to customize deals on a per user basis and analyze the expected impact of the implementation. Furthermore, Darwin exposes the factors driving each prediction on a per customer basis to further customize recommendation strategies.
How Can I Experiment with Darwin?
Take the next step in your machine learning journey with Darwin’s automated model building approach.
Evaluate Darwin today here: https://www.sparkcognition.com/darwin-trial/