Editor’s note: David Koll is a speaker for ODSC West 2022 this November 1st-3rd. Be sure to check out his talk, “Any Way You Want It: Integrating Complex Business Requirements into ML Forecasting Systems,” there to learn more about building forecasting solutions!
Forecasting is arguably one of the most traditional applications for data-driven solutions. Enterprises are attracted by the potential gains from automating their forecasting processes across the value chain. Topics such as demand and sales forecasting or capacity planning are core processes that are often still executed with mainly manual work and could therefore greatly benefit from automation.
However, automated forecasting solutions are tricky to build in real-life enterprise environments. The reasons are – among others:
- Within a single forecasting problem, time series data will likely be heterogeneous, requiring a careful segmented selection of forecasting models.
- Metrics like RMSE, MAE, or (s)MAPE in most cases are not enough to adequately describe the performance of a model, even if a human or system benchmark exists.
- Besides performance, many different business requirements need to be incorporated. These include the consideration of strategic business goals, the limitation of complexity in favor of run time, or the requirement of predictions that are explainable to the end users of the product.
At the heart of each forecasting solution are predictive models. The selection of a model for a forecasting task has implications for the entire forecasting pipeline. While more traditional models such as AutoARIMA or Exponential Smoothing are modeling each time series independently, Machine Learning approaches learn patterns from all series (or subgroups of all series) at once. Similarly, traditional models usually expect their input in a long format, while ML solutions require a wide tabular data format. Also, classical algorithms usually are able to predict n-steps ahead at the same time, while ML models need either an explicit recursive forecasting approach or one model for each horizon. Finally, some models are working on univariate time series data while others can easily incorporate multiple variables (including categorical ones).
The typical forecasting scenarios at Continental are not pretty. We offer an extremely complex product portfolio in a business world that was already rapidly changing before the black swan events of COVID and the Ukraine war. For instance, in the automotive sector, our company produces everything from brakes to cockpit displays or single hardware sensors. The picture below shows the wide range of products that have made their way into a single car, the new VW ID Buzz electric vehicle:
Many of these products will run some sort of software, and each software update essentially creates a new product version that needs to be forecasted in terms of demand and sales. Additionally, being mostly new products supporting the transition to electric vehicles, a solid demand history is often not available. Similarly, in the tires world, we have thousands of different tires in each of our markets worldwide in an often-erratic B2B environment – as we do not sell directly to the end user.
Consequently, within a certain forecasting project, it is a given that we will encounter all the types of time series shown below.
Figure 1: Different types of time series encountered in a typical project.
In other words, time series may show predictable patterns (e.g., seasonal (a)), but in most cases are either without a clear pattern (b), sporadic (c), or short-lived with rapid ramp-ups and ramp-downs (d).
This situation mandates a careful model selection for each time series. Our typical forecasting best practice is thus to:
- Cluster time series according to their type
- Preprocess data such that it is available in both wide and long format
- Run model selection based on historical (backtested) performance for each cluster
In the vast majority of cases we have found that a direct forecasting approach (i.e., one model for each forecasting horizon and cluster) with the use of Gradient Boosting Methods (e.g., XGBoost, LightGBM) works best for our product range. The reasons include:
- Machine Learning can exploit a wide feature range natively. In our context often the best predictor is our orderbook at the time of forecasting (i.e., knowledge about future sales), which can easily be included as features for an ML model (in contrast to traditional forecasting algorithms).
- Gradient Boosting is one of the methods with the lowest computational overhead (as opposed to, e.g., methods such as the various approaches in Deep Learning).
- It also accepts historical N/A values as input, which occur frequently when forecasting new products. Traditional forecasting approaches often simply do not work at all with only a few historical data points.
- Tree-based models are interpretable for non-experts after some training to some extent, which enables building trust with the end users of the solution.
With this strategy, we have been able to consistently build forecast models that significantly outperform not only human and system benchmarks but also commercial forecasting solutions.
Metrics and Benchmarks
However, outperforming a human benchmark is typically not the success criteria in a forecasting product but is rather the point at which the real work starts. The main issue is that a single metric such as, e.g., the Root Mean Squared Error (RMSE) is not able to provide judgment about the actual quality of the forecast. For illustration purposes consider a look at the sporadic time series from our examples above. Figure 2 below depicts two potential forecasts for this time series.
In our experience, a machine learning model that optimizes an error metric like RMSE will eventually arrive at a flat forecast (Forecast 1 above) for this type of demand pattern, since distributing the expected forecast value across all steps results in the lowest RMSE. In the example above, where all forecasts are 2 and the true demand is 40 at a time step 8 and 0 elsewhere result in an RMSE of ~39. Forecast 2, which hits the demand perfectly and is simply off by one forecast step will have an RMSE of ~57 (in other words about 50% higher than that of Forecast 1). However, from a business perspective, Forecast 2 is more preferable as it will steer the supply chain more appropriately.
Similar to this example, other metrics have other weaknesses. It is therefore often cumbersome to evaluate the quality of a forecast, particularly in the presence of a wide range of business requirements.
In general, how well a forecast adheres to such business requirements is more crucial to the success of a forecasting product than its raw performance with regard to previously defined metrics. Consider Figure 3 below as an example.
Here, we see the demand and two forecasts that capture the nature of the demand almost perfectly. Forecast 2 is in this case more accurate (i.e., closer to the actual demand) – however, it is consistently under-forecasting the demand. In contrast, Forecast 1 is less accurate but is consistently over-forecasting the demand. Simply applying a metric like RMSE would judge in favor of Forecast 2. However, Forecast 1 may actually be closer to what the business actually requests.
Figure 3: Exemplary demand versus two different forecasts. Forecast 2 is more accurate than Forecast 1.
If we look back at the exemplary car components we have listed above for the VW ID Buzz, many of them require semiconductors to be built. For the past years, these parts have been particularly hard to get so the business may want to artificially increase the demand forecast to secure more of the required chips. Therefore, the business will also want to rather over-forecast than under-forecast these particular products – however probably not the entire product range.
Another example of such a business requirement is that of strategic forecasting. Usually, an enterprise will plan the high-level sales goals for different time buckets, and often a demand forecast is expected to adhere to that goal. This adds additional constraints to a model as it needs to operate inside a certain corridor that respects the overall business goal.
To overcome these issues there are a couple of best practices that we have developed alongside many forecasting use cases, including the use of custom loss functions for model optimization and several postprocessing techniques.
In my ODSC West Tutorial, I will cover the problems mentioned in this blog post in a more practical and detailed way. This includes among other topics:
- How to efficiently formulate a machine learning problem for direct time series forecasting. We will look into actual code that will create the appropriate data format that serves as an input to the best-of-class model.
- How to steer the forecast to match business requirements. We will see code from a real-world use case and how we apply custom loss functions to our models to adhere to expected volume ranges.
- How to ensure adaptation of the forecast by its end users with the help of explainable AI.
For additional best practices that will be discussed in that talk please have a look at the session page!
Bio: David Koll is a Senior Data Scientist at Continental Tires, Germany. He holds a PhD in Computer Science from the University of Göttingen with research visits to the University of Oregon (USA), Uppsala University (Sweden), and Fudan University (China). Most of his academic work was involving analyses of social media. Since joining Continental in 2018 he has developed different analytical solutions that are now running in production, with a focus on both forecasting and Industry 4.0.