Data scientists competing in Kaggle competitions often come up with winning solutions using ensembles of advanced machine learning algorithms. One particular model that is typically part of such ensembles is Gradient Boosting Machines (GBMs). Gradient boosting is a machine learning method used for the solution of regression and classification problems which employs an “ensemble” of weak prediction models (typically decision trees) to produce a powerful “committee.”
The technique dates back to Jerome H. Friedman’s seminal paper from 1999, “Greedy Function Approximation: A Gradient Boosting Machine.” I recommend all data scientists to read this important learning resource to better understand the genesis of today’s most popular statistical learning algorithm.
Since then, there have been a number of important innovations that have extended the original GBMs: h2o, xgboost, lightgbm, catboost. Most recently, another algorithm has surfaced by way of a new arXiv.org paper appearing on Oct. 9, 2019: NGBoost: Natural Gradient Boosting for Probabilistic Prediction by the Stanford ML Group. It is built on top of scikit-learn, and is designed to be scalable and modular with respect to choice of proper scoring rule, distribution, and base learners. In this article, I’ll overview the NGBoost algorithm and its place in the boosting trajectory.
[Related Article: XGBoost: Enhancement Over Gradient Boosting Machines]
The idea behind the algorithm is rather straightforward: Train the base learner to output for each training example such as probability distribution that minimizes the proper score. Natural gradient descent is used as an optimization algorithm and the base learner is a collection of weak learners using the boosting approach.
NGBoost is a fast, flexible, and easy-to-use algorithm for probabilistic regression. Since its release, some confusion has surfaced with respect to probabilistic regression. NGBoost is not configured to work for classification problems, so that can be considered a significant deviation from the other boosting algorithms. The authors have indicated it will soon be possible to use NGBoost for classification; the reason they didn’t focus on classification initially is because classification is typically already probabilistic.
Natural Gradient Boosting
The NGBoost boosting algorithm uses Natural Gradient Boosting, a modular boosting algorithm for probabilistic predictions. This algorithm consists of three abstract modular components: base learner, parametric probability distribution, and scoring rule. All three components are treated as hyperparameters chosen in advance before training. Let’s examine these terms:
Source: “NGBoost: Natural Gradient Boosting for Probabilistic Prediction,” by Duan, et al. (2019)
- Base learners – The most common choice is decision trees, which tend to work well on structured inputs.
- Parametric probability distribution – The distribution needs to be compatible with the output type, e.g. Normal distribution for real-valued outputs, Bernoulli for binary outputs.
- Scoring Rule – Maximum Likelihood Estimation (MLE) is an obvious choice. More robust rules such as Continuous Ranked Probability Score (CRPS) are also suitable.
The key NGBoost innovation is in employing the natural gradient to perform gradient boosting by casting it as a problem of determining the parameters of a probability distribution.
Predictive Uncertainty Estimation
NGBoost enables predictive uncertainty estimation with gradient boosting through probabilistic predictions, including real-valued outputs. Using natural gradients, NGBoost is able to overcome technical challenges that make generic probabilistic prediction difficult with gradient boosting.
Estimating the uncertainty in the predictions of a machine learning model is important for real-world production deployments. It’s important for models to make accurate predictions, but we also want a correct estimate of uncertainty along with each prediction.
Probabilistic prediction, which is the approach where the model outputs a full probability distribution over the entire outcome space, is a natural way to quantify those uncertainties. Compare the point predictions vs probabilistic predictions in the following examples.
Source: Stanford ML Group
NGBoost makes it easier to do probabilistic regression with flexible tree-based models. Further, it has been possible to do probabilistic classification for quite some time since most classifiers are actually probabilistic classifiers in that they return probabilities over each class. For instance, logistic regression returns class probabilities as output. In this light, NGBoost doesn’t add much new.
But not all classifiers output probabilities, but rather the most likely class label. This is also the case with most regression algorithms where you get back a single real number as the expected outcome. Consider linear regression where the predict function returns a single value rather than a distribution over all possible values. In this case, NGBoost adds a way of doing probabilistic regression
[Related Article: Best Machine Learning Research of 2019]
Here is an example of using NGBoost with Kaggle’s NFL Big Data Bowl competition data set. After initial benchmarks with other popular boosting algorithms like LightGBM, and bagging algorithms like Random Forest, NGBoost performance is seen to be slightly better, although this is just the beginning as there are many avenues for future work.
The GitHub repo for NGBoost can be found HERE.