Experimenting with games is an excellent way to better our understanding of deep learning. In recent years, AI has famously become the best player in a number of games such as Go, Backgammon, and Chess.
However, all these games are known to have perfect information. For each player, nothing is hidden; no decision truly comes down to emotional observation. With this in mind, it never actually surprised me that well-designed deep learning models succeeded.
I have always admired Texas Hold’em poker, because to do well a player must make deductions and make their opponent deduce badly. These tactics involve a deeper skill, one that is entirely human-based. The game consists of imperfect information.
I have always wondered how successful deep learning could be in the world of poker. Seeing AI beat a professional poker player would be fascinating, suggestive of AI’s future, and perhaps a little sinister, too. If you know how to play, you can skip the next paragraph. If not, it’s worth reading my extremely vague overview of the game.
Texas Hold’em evolves around each player receiving two random, private cards. There are four rounds of betting for each hand, and on each round the dealer reveals a new public card that changes each player’s hand, for better or worse. The way a player’s two cards combine with the public cards determines the strength of their position. After the last round of bets, each player presents their cards to see whose garners the most points. A player can fold (give up) during any round at the expense of not getting any previous bets back.
The game is mainly about how players bet and when. Betting can be used to mislead an opponent to the extent where they might fold a better hand. My explanation does not demonstrate quite how complex the game can get with so many factors to consider when making a play — I highly suggest watching a proper tutorial on Texas Hold’em to understand the intricacies.
DeepStack seems to be the first project to partially succeed to put machine learning to the test against poker. DeepStack beat 11 professional poker players, having played them for 44,000 hands in total.
DeepStack’s technical approach is particularly interesting because it uses separate neural networks for each state of the game (pre-flop, flop, etc) which acknowledges the fact that each play state requires a completely different perspective. On top of that, DeepStack constructed very explicit neural network target values by generating subtrees to represent all possible counterfactual values. In other words, each possible outcome for the next state was mapped out, and specific moves became favoured in specific scenarios over time.
The network was trained by millions of randomly generated hands. This includes random public cards, random pocket cards, and random pot sizes. These are also the input to the first neural network (which consists of seven fully connected hidden layers). This network is embedded in an outer network that forces the counterfactual values to satisfy the zero-sum property. The outer computation takes the estimated counterfactual values, then computes a weighted sum using the two players’ input ranges, resulting in separate estimates of the game value. Here is a visualization of the whole network, taken from DeepStack’s writeup:
It is amazing to see AI succeed in a game I know to be so human-based. It’s another reminder of how powerful machine learning is becoming, and how important data science is in feeding AI’s future. To learn more about DeepStack, check out its website or watch some of the games the model played here.