

On Taking Things Too Seriously: Holiday Edition
PythonTools & LanguagesPython|Sports Analyticsposted by Will McGinnis January 7, 2018 Will McGinnis

For some reason Atlanta got a pretty significant amount of snow yesterday, and because of that I’ve been mostly stuck at home. When faced with that kind of time on hand, sometimes I spend too much time on things that don’t really matter all that much. Recently, I’ve been fascinated with rating systems (see a post on Elote here), so that was in the front of my mind this week.
Every year, around this time, my family does a college football bowl game pick ’em pool. We all pick who we think is going to win each respective bowl game, and whoever gets the most right at the end of it all (weighted by the tier of game it is sometimes), wins a prize. The prize is unimportant, what’s important is that I’ve never won. And that bothers me.
So for the past day I’ve been continuing to develop elote, a python package for developing rating systems, and two complimentary projects that I just published:
- keeks: a python package for bankroll allocation strategies, like the Kelly Criterion
- keeks-elote: a python package for backtesting coupled rating systems and bankroll allocation strategies
So with all 3 of these, some historical odds data and the data for this season of college football games, I can develop a rating system capable of ranking football teams at each week of the season, a prediction component which estimates likelihood of victory between any two teams using those rankings, a bankroll allocation strategy to turn those estimates and odds into a set of bets, and backtesting system to evaluate the whole thing. That sounds like a lot, because it is.
So here’s what the script actually looks like at the end (I recommend reading the elote post before this if you haven’t already):
from elote import LambdaArena, EloCompetitor, ECFCompetitor, GlickoCompetitor, DWZCompetitor from keeks import KellyCriterion, BankRoll, Opportunity, AllOnBest from keeks_elote import Backtest import datetime import json # we already know the winner, so the lambda here is trivial def func(a, b): return True # the matchups are filtered down to only those between teams deemed 'reasonable', by me. filt = {x for _, x in json.load(open('./data/cfb_teams_filtered.json', 'r')).items()} games = json.load(open('./data/cfb_w_odds.json', 'r')) # batch the games by week of year games = [(datetime.datetime.strptime(x.get('date'), '%Y%m%d'), x) for x in games] start_date = datetime.datetime(2017, 8, 21) chunks = dict() for week_no in range(1, 20): end_date = start_date + datetime.timedelta(days=7) chunks[week_no] = [v for k, v in games if k > start_date and k <= end_date] start_date = end_date # set up the objects arena = LambdaArena(func, base_competitor=GlickoCompetitor) bank = BankRoll(10000, percent_bettable=0.05, max_draw_down=1.0, verbose=1) # strategy = KellyCriterion(bankroll=bank, scale_bets=True, verbose=1) strategy = AllOnBest(bankroll=bank, verbose=1) backtest = Backtest(arena) # simulates the betting backtest.run_explicit(chunks, strategy) # prints projected results of the next week based on this weeks rankings backtest.run_and_project(chunks)
All of this, including the source data, is in the repo for keeks-elote, under examples.
So to begin with we are basically just setting up our data. Keeks-elote takes data of the form:
{ period: [ { "winner": label, "loser": label, "winner_odds": float, "loser_odds": float }, ... ], ... }
So each week of the season is a period, and each game that week is a nested blob with winner and loser indicated, and odds if we have them. Keeks-elote will iterate through the weeks, making bets and then updating the rankings based on the results of the week.
As the user, you can see we really only have to define a few things once the data is in the correct format:
- Arena: we need to define a lambda arena, which will take in data as passed in. As I work with some more datasets, I expect that this can be handled under the hood by the backtester, but we will see.
- Bankroll: the bankroll is only needed if you are making a strategy, which is only needed if you are going to use run explicit to simulate bets. It takes a starting value, you can optionally set a max drawdown percentage to quit at, and a percentage of the total to bet each period.
- Strategy: the strategy is what converts likelihood and odds to a set of bets. Currently there are two types implemented, both shown here. Kelly Criterion attempts to be clever, AllOnBest just puts the max amount bettable on the bet with highest likelihood to be correct.
As configured in that script, with the data I have, I get this output (betting doesn’t start until we have a few weeks of ratings):
running with week 1 running with week 2 running with week 3 running with week 4 evaluating 500.0 on Opportunity: Buffalo over FL_Atlantic depositing 384.62 in winnings bankroll: 10384.62 running with week 5 evaluating 519.23 on Opportunity: Stanford over Arizona_St depositing 51.92 in winnings bankroll: 10436.54 running with week 6 evaluating 521.83 on Opportunity: W_Michigan over Buffalo depositing 177.49 in winnings bankroll: 10614.03 running with week 7 evaluating 530.7 on Opportunity: Arkansas_St over Coastal_Car depositing 63.71 in winnings bankroll: 10677.74 running with week 8 evaluating 533.89 on Opportunity: Colorado_St over New_Mexico depositing 144.29 in winnings bankroll: 10822.04 running with week 9 evaluating 541.1 on Opportunity: Notre_Dame over NC_State depositing 184.05 in winnings bankroll: 11006.08 running with week 10 evaluating 550.3 on Opportunity: Arkansas over Coastal_Car depositing 16.51 in winnings bankroll: 11022.59 running with week 11 evaluating 551.13 on Opportunity: Oklahoma over TCU depositing 181.89 in winnings bankroll: 11204.49 running with week 12 evaluating 560.22 on Opportunity: Wake_Forest over NC_State depositing 368.57 in winnings bankroll: 11573.05 running with week 13 evaluating 578.65 on Opportunity: Washington over Washington_St depositing 162.09 in winnings bankroll: 11735.14 running with week 14 evaluating 586.76 on Opportunity: Boise_St over Fresno_St depositing 152.41 in winnings bankroll: 11887.54
Seems reasonable to me.
So before I get to my bowl picks from this system, these projects are pretty fun, and we can make some interesting projections on a lot of things, both within and outside of sports. If you’re interested in this kind of thing, comment here or find any of the projects on github and get involved:
Ok, here’s the picks based on Glicko1 ratings, which performed well in backtests (and more importantly has Auburn winning and Alabama losing), I’ll do another post in about a month with how we did:
Winner | Loser |
West Virginia | Utah |
N Illinois | Duke |
UCLA | Kansas State |
Florida State | Southern Miss |
BC | Iowa |
Purdue | Arizona |
Missou | Texas |
Navy | Virginia |
Oklahoma St. | VT |
TCU | Stanford |
Michigan St. | Washington St. |
Texas A&M | Wake Forest |
Arizona St. | NC State |
Northwestern | Kentucky |
New Mexico State | Utah State |
Ohio State | USC |
Miss St. | Louisville |
Memphis | Iowa St. |
Penn St. | Washington |
Miami | Wisconsin |
USCe | Michigan |
Auburn | UCF |
LSU | Notre Dame |
Oklahoma | UGA |
Clemson | Alabama |
N. Texas | Troy |
Georgia State | WKU |
Oregon | Boise St. |
Colorado St. | Marshall |
Arkansas St. | MTSU |
Grambling | NC A&T |
Reinhardt | St. Francis IN |
FL Atlantic | Akron |
SMU | Louisiana Tech |
Florida International | Temple |
Ohio | UAB |
Wyoming | C. Michigan |
S. Florida | Texas Tech |
Army | SDSU |
Toledo | App State |
Houston | Fresno State |
Original Source.