Summer is here! All across Europe, temperatures are soaring, and after a long and difficult year of the pandemic, people are making the most of the good weather. And with summer, sports come hand-in-hand. It’s certainly been an exciting and controversial summer of sports so far this year, but the event I’ll be watching most closely is more normally associated with muddy puddles than hot summer evenings. I am, of course, talking about rugby.
The British & Irish Lions is an invitational tour for the best players from the “four nations”: England, Scotland, Wales, & Ireland. The teams come together once every four years to play one of the top Southern Hemisphere sides: New Zealand, Australia, or South Africa. This year, the Lions traveled to South Africa where they will be facing the current World Champions in a grueling series of three test matches as well as a series of warm-up games against some of the regional sides.
So, what does this have to do with data analytics? A few weeks ago, I was scrolling through social media and came across a sports pundit’s predictions on the starting line-up for the first test match in the series, on July 24th. My first thought was that it definitely wasn’t the line-up I’d have picked or expected and my second thought, rather sadly, was “I wonder if we could pick something better using analytics”.
I’d seen other demonstrations and examples in sports analytics where you perhaps pick a fantasy league using analytics, where you can use optimization procedures subject to a budget constraint. But picking the Lions line-up by numbers is a rather different task.
Firstly, we are picking the best starting line-up from a squad of players who have been shortlisted. They’ve already boarded the plane to South Africa and have been training in the camps readying themselves for selection. There is no “budget” constraint here – the coach simply has to pick the best team to face South Africa from a fixed pool of players. This line-up is likely to change between the different tests based on evolving tactics, injuries, and second-guessing which side “The Boks” will choose.
The goal of this exercise was to come up with a model that generates a starting line-up based on performance data and analyst preference weightings. I wanted to produce a model that was flexible enough that it could be tweaked to generate the best side even if playing preferences change. In simulation and optimization, this is often referred to as a sensitivity analysis. This has a very practical application for this dataset too; if the coach wanted to re-run the model to select a side that is better for pure attacking or defensive play one would only need to change the parameter weights to re-select the side.
In this blog, we also look at how the selections change between the first and second tests with differences to the squad availability.
You can read the technical deep dive blog from the first test match here: https://communities.sas.com/t5/SAS-Communities-Library/Using-SAS-Viya-to-Select-the-British-amp-Irish-Lions-Rugby-Team/ta-p/755754
The player selection is done via a two-fold approach. Firstly, each position has a weighted preference ranking based on a set of inputs using the PROMETHEE algorithm. Here we get a complete preference ranking for each eligible player of each position. Since many players cover multiple positions, these players get a rank for each position they can cover.
Once we have the complete rankings by position, we reduce the list as a Linear Assignment Problem solved as the minimum-weight matching in a bipartite directed graph.
This process is done twice – firstly for the starting line-up (positions 1-15) and then again for the remaining players to fill out the bench (positions 16-23). The only difference between these steps is a change to our preference weightings. When picking the bench we increase our preference weighting for the number of positions a player can cover, since we often want utility players on the bench to cover possible match injuries.
Of course, as with any model, it serves best as a data-driven guide to business decisions – and should be seen as a decision-making tool rather than the de facto optimal solution. I feel this is particularly pertinent to something like selecting a team, because there is an element of subject matter expertise relied on from a strategic perspective and many of the variables in the model may have a subjective element. For example, if you want to select the best goal kicker then choosing them based on purely their kicking accuracy is misleading as there are many factors such as point in-game, wind, angle, etc. that may influence the kicking accuracy. Therefore, how you define kicking accuracy may depend on what you are looking for in a player.
The same can be said with having an over-reliance on quantitative metrics for player selection in isolation. Take the flyhalf position for example. In many of the models, Owen Farrell is selected as the starting 10. The reason for this is very simple. He has many international caps, previous Lions caps, and has very strong performance metrics on average, he is well known for his laser precision kicking. Where he often comes under scrutiny is not being as creative in distribution and playmaking as other flyhalves. Most recently he has been playing at 12 for England with the playmaking being done typically by George Ford. It goes without saying that a mathematical model for player selection will be selecting players by looking at the numbers. If you cannot define a metric for creativity or flare, it won’t be considered when modeling player selection.
The final solution for this project was to build an interactive dashboard with several sides selected, and the ability to bring in a new set of analyst selections to re-select the side dynamically. Figure 1 shows the interactive report built-in SAS Visual Analytics. This shows pre-ran selections for varying preference weights associated with team level attributes like Attacking, Defensive, Youthful and Experienced.
The value of this dashboard is seeing how the model picks players based on league and international performance. Looking at the ‘Attacking’ side there is an interesting line-up. There are ‘bolter’ players who’ve not had much international experience but have lit their domestic leagues on fire this season, most notably Sam Simmonds and Louis Rees-Zammit. Ali Price is also selected at scrumhalf when, interestingly, he was selected for in the 1st test match ahead of Conor Murray to get his first-ever Lions cap.
What also makes this report interesting is that each of the players selected in the squad was selected as a contender for the starting line-up, and every fan will have a different view of who deserves the starting spot and the beauty of this dashboard is, depending on how the parameters are changed, every player gets a fair evaluation.
Figure 1 – Team Selection Simulations Dashboard
For this project, I was only able to use open datasets readily available on the internet. Performance statistics were gathered at a player level for the Six Nations Championship for the last three years (2021-2019). The Six Nations is an annual international competition between the top Northern Hemisphere sides: England, Wales, Scotland, Ireland, France, and Italy.
As well as international performance stats it was important to bring in domestic league performance. I gathered domestic league data for the current season for the English Gallagher Premiership and the United Rugby Championship (formerly the Pro14).
I further augmented this data with some open datasets from sites like Wikipedia for individual player bios (height, weight, position, etc.)
Working with public data has its limitations, there were some differences in stats between leagues and open data has the potential for missing or incorrect data. The open data is also a drop in the digital ocean of real-life sports data, given that players wear GPS trackers giving precise metrics for every moment of play. With a richer dataset, these selections could be very different.
Rugby is also an incredibly physically demanding game, and because of this players are regularly injured. The model and dataset is based on the most recent squad that was selected by the coach, Warren Gatland.
Even as I write this blog it is possible that the squad selections are fluctuating, with players recovering and picking up injuries. Most recently Finn Russell looks like he might be back in contention after recovering from an Achilles injury, and Wyn Jones is now a doubt with a shoulder injury.
The Simulation Dashboard
Since there is a strong element of subjectivity in the model process it makes sense to move this out of a code-oriented environment and into the hands of the Subject Matter Experts directly. To give an example of how this might work we use an interactive dashboard with SAS Visual Analytics which calls the Job Execution Service behind the scenes to run the SAS model interactively.
In Figure 2 you can see how the SAS code is seamlessly blended into the Visual Analytics Dashboard using an HTML interface. Because there are so many parameters to fiddle with it made sense to allow analysts to upload a CSV file of preferences, rather than manually editing each cell in the browser. Analysts can simply toy with the variables, upload a file to the interface and then re-run the simulation.
Figure 2 – Job Execution Service Front End in Visual Analytics
The simulation output shows in a new browser tab so you can validate that the model has run. A nice feature as well is the ability to download a PowerPoint report from the Job Execution Service output. This PowerPoint file is produced using ODS while the simulation script runs in the backend
Figure 3 – Job Execution Service Output with Download Link
Figure 4 – PowerPoint Report Download Output
We are able to then interactively compare the results of multiple model simulations in an interactive Visual Analytics tab. Each time the model runs it gets added to a group table that labels the model by the file weights. In Figure 5 you can see I have multiple model weights depending on the type of team I am looking to select. Then in Figure 6, you can see the interactive report output where we visualize our selection. The starting line-up is visualized with a custom layout to mimic the field positions.
Figure 5 – Model Weight Files
Figure 6 – Interactive Dashboard for Model Simulations
How well did the SAS model perform?
I ran five simulations based on differing preference weights: Attacking, Defensive, Experienced, Neutral, and Youthful.
Looking at the player selection counts, several players were selected in many of the scenarios, as shown in Figure 7.
Figure 7 – Player Selections across models
So, how does the model perform against the actual team selection?
The actual team selected for the 2nd test match on Saturday 31st of July is:
- Mako Vunipola (selected 4/5 times in the SAS models)
- Luke Cowan-Dickie (selected 2/5 times in the SAS models)
- Tadhg Furlong (selected 5/5 times in the SAS models)
- Maro Itoje (selected 5/5 times in the SAS models)
- Alun Wyn Jones (selected 0/5 times in the SAS models)
- Courtney Lawes (selected 5/5 times in the SAS models)
- Tom Curry (selected 3/5 times in the SAS models)
- Jack Conan (selected 1/5 times in the SAS models)
- Conor Murray (selected 2/5 times in the SAS models)
- Dan Biggar (selected 4/5 times in the SAS models)
- Duhan Van Der Merwe (selected 1/5 times in the SAS models)
- Robbie Henshaw (selected 5/5 times in the SAS models)
- Chris Harris (selected 1/5 times in the SAS models)
- Anthony Watson (selected 2/5 times in the SAS models)
- Stuart Hogg (selected 4/5 times in the SAS models)
- Ken Owens (selected 2/5 times in the SAS models)
- Rory Sutherland (selected 5/5 times in the SAS models)
- Kyle Sinckler (selected 5/5 times in the SAS models)
- Tadhg Beirne (selected 5/5 times in the SAS models)
- Taulupe Faletau (selected 3/5 times in the SAS models)
- Ali Price (selected 5/5 times in the SAS models)
- Owen Farrell (selected 4/5 times in the SAS models)
- Elliot Daly (selected 5/5 times in the SAS models)
Looking at a histogram, in Figure 8, of selection frequencies we can see that almost 40% of players in the actual team are selected in every SAS model. The model appears to work fairly well, and less than 20% of players are selected once or less by the SAS models.
Figure 8 – Overall Model Performance for 2nd test match
Compared with the selections generated for the 1st test match, looking at Figure 9, more than 30% of players were selected in every single model. Given that there was far less data available than sports teams would actually have access to the models perform surprisingly well for both test matches.
Figure 9 – Overall Model Performance for 1st test match
Comparing the results of the SAS-driven selections for the 2nd test match, we can see that the model still performs well, though the exclusion of Alun Wyn Jones is suspicious. It may be that the model is introducing bias into the selection as he deviates the most from the mean age of the group, despite having stellar performance statistics and a wealth of experience. A benefit of moving towards data-driven decisions is to remove human bias from decisions where possible, hence this is an important illustration of the importance of carefully deciding on model inputs and being sure to scrutinize your model results to unpick any issues.
Overall, the SAS models provided a reasonable simulation of the teams. Given the relative lack of data, I was not expecting it to correctly pick the full line-up, but the models actually do a very good job – especially when considered in aggregate. This re-iterates the point I made in the introduction: a model is best served as a guide, with which you can then make an informed decision.
Reading the press release from Warren Gatland this is also how he picks his team – he does not select the full side himself, he asks the rest of the coaching team to come up with their own 23, and then they compare and debate the results.
The full press release on how they made their selections is here: https://www.lionsrugby.com/2021/07/21/lions-selection-for-first-test-hardest-ever-for-gatland/
For further information on our offering on Sports Analytics, please visit our website Sports Analytics with SAS
About the author on using SAS to select Rugby teams
Harry Snart is a part of the Public Sector Customer Advisory team at SAS UKI. He has an academic background in economics and data analytics and has experience in advanced analytics, business intelligence, and cloud computing, including open-source data analytics with R, Python, and SQL. Visit Harry’s previous blogs on SAS blogs.