In a previous post, I wrote about Throne AI, a sports prediction platform or “Kaggle for sports.” If you’re a sports fan and interested in using your machine learning abilities to predict the outcome of sports matches, then I highly recommend you sign up for Throne AI.
After becoming obsessed with the platform, I wanted to know more about how it was created, what its future looks like, and the mind behind it. In this article, I interview the founder of Throne AI, Ross Taylor.
The following transcript has edited and condensed for the purpose of clarity.
First off, tell us about your background, your education and work experience.
My academic background is in econometrics and statistics, where I did my research on modeling time series data at Cambridge a few years back. I’ve worked as a quantitative analyst both in finance and statistical consulting, which involves using machine learning to build predictive models that can be used to find edges in markets.
Given your full-time job, it must be challenging to work on Throne AI in your spare time?
It is time-consuming and involves evening and weekend work. The main priority for me right now is to get core features in place, which justifies the heavy workload in the short-term. In the long-term, it means moving more to a maintenance role which should be less time-consuming. The counterweight to the workload is the fact I get energy doing the work because it’s enjoyable, so in that sense it hasn’t been too problematic.
So would you say it’s in a beta mode?
It’s a proof of concept right now. I would start thinking about a beta once incentives are in place for users to participate – prizes or otherwise. I think modeling sports is inherently fun, but definitely, to make the platform sustainable, it’s important to allow people to be rewarded for making good models. That’s where I would be devoting most of my time in the upcoming months, not just stabilizing what I have but actually getting an end-to-end system where people can make models and be rewarded for them
Take me through how you crafted the platform and explain how you maintain it as well.
I had experience in using machine learning and statistics for modeling sports, and I was aware of a quite strong hobbyist interest in the data science community. What was missing was a central hub for sports modeling, that would give users tools, data, and competitions for the quantitative modeling of sports.
Maintaining the website is basically a process of ensuring that the data sources continue to be reliable, that the relevant nightly updates for competitions and rankings are completed, and so on.
Tell me more about the data. Where does it come from it? What sort of wrangling, munging, and feature engineering is applying to it?
There are several different sources of data. The core data is the results data which is publicly available from a number of sources, so scores for games in a number of sports leagues, such as the English Premier League, NFL, and so on.
There is then supplementary publicly available in-game data on top of this results data. This can include in-game statistics, such as shots and corners for football, as well as information like team line-ups for sports (i.e. which players are playing or expected to be playing in a game).
There is a separate source of data which is the transformations applied through feature engineering – since, on Throne, we give users a set of default predictive features to get started with modeling. The emphasis is now on users to engineer their own features, which can be using their own data, or data through the Throne API.
There is still a role for the platform in the future in teaching users how to do effective feature engineering for sports.
What have you learned from making Throne AI?
I already had a strong base in sports modeling, but for sure I learned a lot more about modeling sports from building a platform as opposed to just making models. I think the main lesson was getting a sense of what was truly important to a user since a lot of the features I thought were more important turned out to be less important, and those features I thought were less important turned out to be more important.
The other thing I learned is how to help educate users to think about making good models in an environment like Throne where you are competing against public predictions. To give you an example, a lot of users initially think they should be “backing” the team where they have the greater probability. For example, in a match between Manchester City versus Swansea, they might think City will win 85% of the time, so users may initially think they are “backing” City.
But in fact, it depends on what the public prediction is. If the public gives City a 90% chance, then they will be backing Swansea – because of the relative probability difference. Now I thought it would be obvious from this who the user is backing, but little things like this actually are one of the main things which confuse initial users. So really the lesson here for me was to understand where the user would be approaching these problems afresh, which meant I had to restructure things so the user is aware of what their objective is and how they will be ranked for their predictions.
I want to talk more about your scoring system. What is the reasoning for not just going with log loss, but the relative log loss?
I’d say the way to frame it is in terms of what the platform’s trying to achieve, and what the users are trying to achieve. The mission of the site is to get data scientists, who may not have prior experience in sports modeling, and enable them to make predictions that are better than consensus predictions in the wider public. So users are competing versus a public benchmark, where public means a prediction obtained from an outside source such as an odds line.
This is why we get users to think in terms of relative log loss instead of absolute log loss because what matters is having a lower log loss than the public. The other way to think about this is that the platform isn’t simply about making the perfect model of a sport like football, and trying to build it in isolation from other predictions. Rather, it may be that a good model starts with publicly available information and finds tactical edges around this – for example, maybe the public prediction underestimates or overestimates certain factors. In this way, model development need not be an absolute thing of building an entire model of football and integrating every single variable, but rather, a more tactical task of finding what information, in particular, is undervalued/overvalued, such as players being overrated/underrated, team form or hot-hands being overrated/underrated, and so on.
What are your recommendations for first-time users?
The easy thing you can initially do is just try a bunch of scikit-learn algorithms with the data and see what works. In some senses, that’s a naive approach, but it’s a good way just to get familiar with the data. More generally, just doing a high-level analysis of the data is good, so looking at correlation matrices, feature distributions, and deciding what you are going to be predicting.
The “what you are predicting” question is surprisingly important, and requires a judgment form the user. For example, you can view a sports result as a classification or regression task. You can think of it terms of classifying a home or away team win. Alternatively, you can formulate a regression problem by predicting a point spread; e.g. a 41-33 win in the NFL would be a spread of +8 and you could try to model that data, and from that prediction, try to manufacture probabilities.
Why use regression to model like NFL scores instead of classification?
It’s a good question. Let’s imagine we have 10 years of NFL data. The data we’re going to have is the score of team one and the score of team two. Classification would need a target variable which denotes 0 or 1 based on whether the home team wins or not. Say for example the home team has 35 points and the away team 20 points – then we assign a value of 1 for the home team, 0 for the away team.
The regression approach may be more efficient because it uses more information. A blowout result where a team wins by a lot of points is much more informative of relative strengths than a game when a team wins by a narrow margin. The classification approach would treat both results the same, and throws away information. So in this sense, the regression approach may be a better option. There is a caveat to this, which is that you can incorporate points information (such as blowouts) as features in a classification problem, but in general, you would want to utilize as much information as possible.
That makes sense for NFL and the NBA where there are two outcomes, but what about soccer and hockey where there are three outcomes?
At a simple level, you might do a multinomial regression if you are treating it as a classification problem. If you are doing a regression approach, then Poisson regression is a good option to look at. By modeling the two scores as count data, you can derive the probabilities for home, draw and away from the “Poisson grid” implied by the model. The Dixon and Coles paper about modeling soccer games is the main reference for this approach.
I hope you found this interview with Ross Taylor as informative as I did. I’ve had a lot of fun using the platform and I think that even non-sports fans have something to learn from it. It’s a project I’ll continue to use and monitor in the near future and I think it has the potential to make a huge impact in the data science community.