Regression Blog 2: We’re Practically Giving These Regressions Away Regression Blog 2: We’re Practically Giving These Regressions Away
When I heard that they would be releasing Pumpkin Spice Spam, I thought of regression. This might seem like a leap, but... Regression Blog 2: We’re Practically Giving These Regressions Away

When I heard that they would be releasing Pumpkin Spice Spam, I thought of regression. This might seem like a leap, but bear with me. In the U.S. in the last few years, hearing news reports of unusual Pumpkin Spice flavored products means the unofficial end of summer, or at least “Summer Time” as I conventionally think of it¹. Fall is often a busy and quick-passing season, and, immediately afterwards, the pleasant bridge between fall and winter known as the Holiday Season.

A part of the holiday season is Black Friday, the day after Thanksgiving when many retailers open early and offer deep discounts. The “Black Friday” name is currently understood to represent that this is the first day of the year many retailers become profitable or “in the black².” This is a critical time for a retailer, and as such, a retailer would want to try to predict how much customers are likely to spend. As an exercise, let’s see how well we can predict Black Friday sales with RAPIDS by participating in a Black Friday prediction contest hosted by AnalyticsVidya. We’re going to try three different kinds of regularized regression, and if none of those work, something else. We’ll evaluate each approach using Root Mean Squared Error.

[Related Article: How Developers are Driving Innovation Through Open Source Economy]

1 — Ridge Regression

We’re going to begin by using a technique that was used in the last blog I wrote, Ridge Regression. This worked well last time so it’s worth trying here. Ridge regression uses what’s known as L2 regularization.

As you can see in the embedded notebook, this model does laughably bad. Five months ago, (when I wrote the first Regression blog), we would have had to go back and do extensive feature engineering or variable transforms. cuML has come a long way since then however, and before I do a lot of feature engineering, I’m going to try some different models.


Lasso Regression is similar to Ridge but uses a kind of regularization known as L1 (we will see a method next that combines them both)³. I don’t expect Lasso regression to do much better, but it won’t take a lot of work to try. Essentially, I can see how well it performs for the price of a copy/paste and a couple of find/replace operations. As you can see in the notebook, this model performs even worse than Ridge regression. This could potentially be a result of Lasso forcing parameters with small impact on the model to zero.

3 — ElasticNet

ElasticNet combines both L1 and L2 regularization, with a hyperparameter to control the trade-off between the L1 and L2 penalty. This means that we’ll have to do a little more work to do a simple hyperparameter search, since we’re looking for the best combination of two hyperparameters. As you can see in the notebook, it performs the same as Lasso, which is to say, very poorly. If our jobs depend on making a good prediction at this point, we probably won’t expect a very happy new year. Time to try something else.

Getting Irritated and Using XGBoost

One of the reasons the first method implemented for RAPIDS was XGBoostis its long history of success. Having tried three methods and still having huge RMSEs, I’m going to use a method that has successfully won many competitions out of the box.

As you can see in the notebook, my minimally-tuned XGBoost model dramatically outperforms everything else we’ve tried. Like a holiday movie, we started out using what we knew had worked before, only to have those hopes scuttled. But just about when we were going to give up, a miracle happened and everything worked out.

If you’re interested in learning more, I suggest you read this blog by Rory Mitchell on the RAPIDS team. He’s not only one of the core contributors to XGBoost, but he’s great at explaining how and why it works as well as it does.

[Related Article: Some Details on Running xgboost]


Even though the three traditional linear models we tried didn’t work very well, we got some good experience working with them. I hope you enjoyed this walk through, and if you use these techniques in your work, moving to RAPIDS will let you train your models much faster than CPU-based packages.

It is easier than ever to get started using RAPIDS for your data science workflows. Check out all of our resources for getting started, no GPU required. Thanks!

¹While it was long said that the Labor Day holiday in the U.S. was the “unofficial end of summer” many children are back in school and University move-ins can be before this as well. You normally hear about, say, Pumpkin Spice deodorant sneaker insoles in early August.

²Doing research for this blog, I learned the term has been used since the 1950s but has had three different meanings. See:https://www.visualthesaurus.com/cm/wordroutes/the-origins-of-black-friday/

³Lasso, like “laser,” is an acronym that eventually evolved into its own word.

Originally Posted Here