Editor’s Note: If you’re anything like me, stubbornly loyal to Pokémon Red, now trying to catch up to the cool kids...

Editor’s Note: If you’re anything like me, stubbornly loyal to Pokémon Red, now trying to catch up to the cool kids post Poképocalypse, you’ll find this exploration worth a share.

About a month ago Pokémon Go, Niantic’s mobile augmented reality game based on the popular franchise, was everywhere. The media’s gaze followed as thousands of old and new players flocked outside to join in the fun across the United States. This gaze eventually went elsewhere, but the game’s devotees increased as it became available to more countries. Regardless of its long-term legacy from a gaming perspective, what Pokémon Go represents is the first widespread embrace of (quasi) augmented reality, an area which is set to change the way we interact with the world.
Augmented reality’s current tethers to Data Science are slim, but Pokémon Go, like its console-based forefathers, is data-rich with the various stats of the game’s creatures. This data set provides an interesting sample. Unfortunately it only contains information on 75 creatures and four distinct types of Pokémon, but that should be enough to catch a glimpse of some interesting trends in these statistics.



The species represented in the data are Pidgey, Weedle, Caterpie, and Eevee. Pidgey is represented the most in the data with 52% of the data points while Eevee only counts for 8%. The difference in sample size is cause to wonder if variability in the various metrics will be accurately represented for each species.
In combat points terms Pidgey has the biggest range of values but is second to Eevee in this sample with a median over two times smaller. Eevee’s small combat point range carries over to health points as does its lead with the highest median. For second place Caterpie takes Pidgey’s spot with the second highest median and highest range. Surprisingly the top end of Caterpie’s health points surpasses even that of Eevee’s. A scatter point of both variables reveals that the Pokémon with more health have more capacity to cause damage.


The data also has information on these stats for the evolutions of these creatures. The concept is a key feature to the universe as some Pokémon can morph into different forms once they increase in strength. For example, Pidgey evolves into Pigeotto, and Weedle and Caterpie evolve into Kakuna and Metapod respectively. Eevee presents a special case with its multi-branched evolutionary tree. Depending on certain factors, an Eevee can evolve into a Jolteon, Vaporeon, or Flareon. (Later editions of the franchise introduce more branches into this tree, but Pokémon Go only uses the original 151 Pokémon catalog.) Unfortunately the data doesn’t explicitly take Eevee’s special case into consideration, and only has information on one or the two levels of the evolutions of the other Pokémon. (Pigeotto can further evolve into Pigeot, to name one.)



Unsurprisingly both health and combat ability show significant jumps when a Pokémon evolves. Eevee continues the pre-evolution trend of leading in both metrics, but shows an interesting deviation elsewhere. The post-evolution ranges for combat ability decrease across the board, but for health Eevee moves away from this trend. While the other ranges decreased Eevee’s actually increased. From a bivariate perspective, evolution doesn’t change the strong positive correlation between health and combat strength.


The patterns in pre and post-evolution height and weight aren’t really interesting in either a univariate or bivariate sense. There is also no relationship in the data with respect to height or weight for health or combat points.



In the older Pokémon games each Pokémon can be taught four moves to use in battle. For Pokémon Go this scope is reduced to a weak and strong attack. In their basic forms Weedle has the weakest attacks of both sectors respectfully. At the other end of the spectrum Eevee has both the strongest weak attack and the strongest strong attack. (Pidgey is second in both categories.) Interestingly enough, Weedle and Caterpie share the same strong attack strength before and after evolution. The name is that attack is Struggle.


After evolution Pidgey has the strongest weak attack, but Eevee’s domination in the strong attack category is strengthened even more. Above I said that the data doesn’t explicitly encode for Eevee’s different evolutions, but it does implicitly through the new weak and strong attacks and two other new attack related columns (chart below). The former’s unique instances among the six values are Water Gun, Ember, and Thundershock. These correspond to Vaporeon (Eevee’s water evolution), Flareon (Eevee’s fire evolution), and Jolteon (its electric evolution). The sample is too small to try to look closely at any difference between these three evolutions but a peek won’t hurt.


Flareon is the powerhouse when it comes to attacking strength from any angle. However, Vaporeon dominated in both the health and combat ability categories. This brings up the question of what the relationship between attack values and combat strength really is. Wouldn’t you expect Flareon to top the combat ability table since it does so in both attacking states? I’ve summed pre and post evolution attack strengths, and create a new lengthened data set to look a the relationship between attack strength and combat power. In essence I’m doubling the size of my data set by considering evolution as separate data points.



There’s actually a pretty strong linear relationship between combat power and combined attack strength, but 33% of the variance in combat power is not explained by combined attack strength in such a model. Where does the rest of the combat power score come from?
That’s a question for another time. Maybe Niantic will be the source of the answer as Pokémon Go continues to thrive in the wake of its unprecedented success.

©ODSC 2016

Gordon Fleetwood

Gordon studied Math before immersing himself in Data Science. Originally a die-hard Python user, R's tidyverse ecosystem gradually subsumed his workflow until only scikit-learn remained untouched. He is fascinated by the elegance of robust data-driven decision making in all areas of life, and is currently involved in applying these techniques to the EdTech space.