# How to Play Fantasy Sports Strategically (and Win)

StatisticsConferencesDirichlet DistributionFantasy FootballMartin HaughODSC EuropeStatisticsposted by Paxtyn Merten November 30, 2018 Paxtyn Merten

Daily Fantasy Sports is a multibillion-dollar industry with millions of annual users. The Imperial College Business School’s Martin Haugh created a framework to best those users by modeling what they’ll do and constructing a team based on it. Haugh presented his research on how to play Fantasy sports strategically at ODSC Europe 2018. He modeled his framework on American football, and his presentation focused on cases where all Daily Fantasy Sports (DFS) players who score above a certain benchmark receive the same reward. Though Fantasy players can usually submit up to *N *teams, Haugh demonstrated the model creating a single portfolio.

*[Related Article: Machine Learning vs. Statistics]*

**Problem formulation**

Haugh wanted to use statistics to produce a Fantasy football team that would score in the top 20th percentile.

DFS teams each have nine players, chosen out of *P *real-world athletes. Athletes each receive weekly scores depending on their performance, δ, which is unknown to DFS players until the week’s games wrap up. In the research, δ is represented as a *P*-dimensional vector, or δ ∈ *R ^{P}*.

The model has to select a team *w *of athletes, represented as a *P*-dimensional series of 0s and 1s, or *w *∈ {0,1}* ^{P}*. So, if there’s 300 players,

*P*will be a vector of 300×1 dimensions, Haugh explained. In the team

*w,*only nine of those positions will be 1, representing the model-selected players.

Once the week’s real-life NFL games are over, a Fantasy team’s score *F* is a result of the athletes’ earned scores, represented by the following equation:

*F*:= *w*^{T}δ

There are some constraints on what NFL players the model (and Fantasy players) can choose. Each athlete cost a certain amount, and each team has a nominal budget. There are also diversity constraints: each Fantasy team can only have one quarterback, for instance, and all athletes shouldn’t be from the same NFL team.

#### Modeling Opponents

Haugh also looked to predict DFS opponents’ behaviors to inform what players his model chose. This is a defining factor between his method and previous Fantasy research.

The number *O *of opponents ranges from one to half a million, depending on the platform’s entry fee and popularity. Opponents’ Fantasy portfolios are also represented by *P*-dimensional vectors and can be calculated as:

*G _{o}* :=

*w*

_{o}

^{T }δ

To make it into the top *r*th percentile, the model needs to beat the stochastic benchmark* G ^{(r)}*.

*G*depends on two unknown values: the player performance (δ) and the athletes other DFS players select (W

^{(r)}_{op).}

The maximization problem, then, looks like this:

It’s a complicated, and certainly non-linear problem, Haugh said.

**Constructing an Optimal Team**

Our maximization equation above can be restated as:

where *Y _{w}*:=

*w*

^{T}δ –

*G*

^{(r)}So, we’re trying to choose a Fantasy team* *to maximize the probability that *Y*_{w }> 0. We do so using a mean-variance approach.

If a team composition’s expected *Y _{w}* is greater than 0, we must try to decrease the variance because we’re “in the money,” as Haugh said. However, if all team compositions have a negative

*Y*we must increase our variance to increase the probability that figure will move up. Using this intuition, Haugh’s algorithm looks like this:

_{w}The variance in cases where *Y _{w}* is both greater than and less than zero must be multiplied by some constant Λ, which is an unknown scalar. That scalar makes this problem more complex, Haugh said. Still, a standard software can solve for the best scalar value.

**Modeling Opponents**

Modeling opponents’ DFS teams is important because the benchmark *G ^{(r)}* depends on opponents’ entries

*W*. Opponents’ entries could add up to half a million of those 300-dimensional team vectors.

_{op}To generate the portfolios of DFS opponents, Haugh presented a *Dirichlet Distribution*, or a distribution on probability vectors.

To show how he used the the Dirichlet process, Haugh gave the example of a quarterback. A random opponent will choose quarterback *k *with the probability *p _{QB}^{K}* for all

*k*. Assuming that the probability a Fantasy player chooses any given quarterback is random — which Haugh said actually is — the marginal distribution of the selection is a Dirichlet distribution. Predictability for quarterbacks — and all NFL athletes — is based on their cost, expected points, momentum, and other features. So really:

*α _{QB}* =

*exp*(

*X*β

_{QB}*)*

_{QB}where *X _{QB}* represents those predictability features.

This process of determining what proportion of opponents will select which quarterback actually performs extremely well, though it varies by week. Here’s the results for one particularly successful week, depicting the model’s prediction as a blue star and interval with a blue bar, and the actual ownership of that quarterback in black.

**Numerical Results**

Though Haugh presented on double-up Fantasy competitions where all teams that score above a certain benchmark receive the same reward, he tested his algorithm in the 2017-18 NFL season with a focus on top-heavy competitions. In these competitions, different reward amounts are dispersed to winners based on how well they do. Very few people win anything, but those who do win get a substantial reward.

He invested $50 per week for both the double-up and top-heavy models and saw a return on investment of more than 350 percent in 17 weeks. The model used a previous study — which didn’t include opponents’ performance — as a benchmark, and outpaced it.

*[Related Article: How Product Managers Learn About AI Meeting Peak Effectiveness]*

In some senses, though, Haugh said he picked the worst sport to model. The NFL has a very high variance due to impacts of weather, injuries, amount of players, and a short season. In future work, he said he would definitely try the algorithm on different sports. He also said it would perform better if it actively updated parameter estimates, i.e. reacted to breaking news that impacts what players get picked up and/or how they will perform.

The model in part depends on Monte Carlo samples and order statistics, which Haugh didn’t address much in his talk at ODSC. For a more detailed description, read the paper he produced in April. At ODSC Europe, he also showed how to determine the value of insider trading and collusion. To learn more, watch the video of his lecture.