How AI is Learning to Play Dota 2 – and Win How AI is Learning to Play Dota 2 – and Win
In August, a team of bots developed by California-based AI company OpenAI competed against some of the world’s best human players... How AI is Learning to Play Dota 2 – and Win

In August, a team of bots developed by California-based AI company OpenAI competed against some of the world’s best human players at Dota 2. The bots took on two leading (human) Dota 2 teams, paiN gaming and Chinese Legends, in two demonstration matches at The International 8, the annual Dota 2 world championship held this in Vancouver. Although the bots lost both matches, their play is being hailed as a major AI success.

Dota 2 and OpenAI

Dota 2 is a multiplayer online battle arena (MOBA) video game produced by the Washington-based game developer Valve Corporation. The name originally stood for “Defense of the Ancients,” an immensely popular mod for Blizzard Entertainment’s Warcraft III dating back to 2003. Released in 2013, Dota 2 is a standalone game that serves as a modernized sequel to the original DotA mod. Since then, Dota 2 has become one of the most popular online multiplayer games of all time, with up to one million players online at any given time.

Image result for dota 2

In Dota 2, two teams of five players compete to destroy the opposing team’s “Ancient” – a large structure near their starting point on the map (the “arena”) – whilst defending their own. The ten players each control one of over a hundred characters known as heroes, all with unique strengths, weaknesses, and abilities. In addition to their core abilities to deal damage against the opposing team and its Ancient, heroes can also accumulate experience points and collect items throughout the map to upgrade their combat abilities. Thus, winning teams must build a strategy that incorporates hero selection and synergy, experience and item gathering, combat skills and abilities, knowledge of the map, and (if available) information on the opposing team. At the highest levels of competitive Dota 2, top teams are composed of full-time professionals who “theorycraft” around optimal strategies for various heroes and maps and spend months and years practicing to face off against other leading players at tournament events.

Image result for openai five

“Team Human” playing against OpenAI Five at a demonstration match (Image source: https://blog.openai.com/openai-five/)

Dota 2’s complexity and popularity have made it the focus of some of the world’s leading AI experts at the renowned non-profit OpenAI. Founded in 2015 by Elon Musk and Sam Altman, president of the Y Combinator, OpenAI’s mission is to build safe “artificial general intelligence” – a machine that can perform any intellectual task that a human can. By building AI’s that succeed in complex and dynamic games like Dota 2, the company believes it can bring humanity closer to the ultimate goal of making AGI’s that function in the messiness of the real world.

OpenAI Five

OpenAI Five is a team of five neural networks based on a massively-scaled version of “Proximal Policy Optimization,” a type of reinforcement machine learning designed by OpenAI. The bots are trained to maximize the exponentially decayed sum of future rewards with data produced entirely from self-play. Essentially, this means the bots are playing against themselves to “learn” to behave in ways that maximize their chances of winning. Of course, there are several factors that make this training process an incredibly difficult task. Here are just two:

  1. Machine learning in the context of a game like Dota 2 requires immense amounts of data. This is due to the fact that Dota 2 represents a “high-dimensional, continuous observation space” – an open world with countless trillions of different possible “board states” and possible decisions at any given time. Indeed, the OpenAI bots observe the state of a Dota 2 match at any given time as a set of 20,000 numbers. Compare that to a game like Chess, where any given board state can be represented by around 70 values. In addition, because of Dota 2’s continuous nature, the OpenAI Five must collect 7.5 observations on the state of the game per second, incorporating this information into its maximization function. Staggeringly, each bot is trained with data from 180 years worth of simulated Dota 2 matches per day!
  2. Dota 2 is a game of comparatively long time horizons. Matches typically last between 30 and 45 minutes of continuous gameplay. True, games of Chess and Go can last a long time, but their turn-based, fixed style means that games are won or lost typically in less than 100 moves. Apart from the massive number of possible moves for each bot in Dota 2,, the problem arises of determining the time horizon for reward maximization. In other words, should the bot be trying to maximize its rewards in the next second? Or should it be trying to maximize its rewards over the entire game? Achieving ultimate victory often requires forgoing short-term rewards for long term strategy. At the same time, focusing only on long term strategy is computationally expensive, and can render the bots less effective at critical junctures in the battle. Taken together, this means that the OpenAI developers must select the correct “half life” of future rewards (vis-a-vis their exponentially decaying sum) to balance short and long term performance.

The Results

In the months leading up to International 8, OpenAI Five beat a number of amateur and semi-professional Dota 2 teams. But the team fell short at the world championship, losing against the pros of the Brazilian team paiN gaming and the Chinese Legends. Both games began with strong starts by OpenAI Five, the bots executing tasks and strategies with extreme precision. Commentators noted how the bots’ behavior differed from the top human players. Mike Cook, an AI games researcher told Verge, “Often, the humans would win a fight and then let their guard down slightly, expecting the enemy team to retreat and regroup. But the bots don’t do that. If they can see a kill, they take it.” But when the human teams took the lead further into the matches, the bots faltered, seemingly unable to make the kinds of risky plays necessary to help them come from behind. Some observers have speculated that the AIs’ reward programming makes them prefer play styles that offer higher-certainty, smaller-sized rewards in lieu of the low-probability, high payoff plays that are often necessary to swing the game. By the end of both matches, the humans were far ahead of their machine adversaries. 

Does this mean that AI has met its match? Not at all. With more data and more training, especially that collected during the matches against the pros at International 8, the bots will return even stronger in future Dota 2 tournaments. Their progress will be well worth watching for those interested in the ultimate dream of artificial general intelligence.

Alex Amari

I’m a graduate student at Oxford University pursuing an MSc in Social Data Science with the ultimate goal of working in tech entrepreneurship.