An Intro to Pokémon Showdown AI for Researchers

This is an informal introduction to the Battling Track aimed at readers who are familiar with machine learning but new to Pokémon Showdown. We'll introduce key terminology and offer some (slightly opinionated) perspectives on how Pokémon concepts shape the challenges and opportunities for AI research.

Pokémon Battling as an AI Problem

Competitive Pokémon turns the Pokémon franchise's turn-based combat mechanic into a standalone two-player strategy game. Players design teams of Pokémon and battle against an opponent. On each turn, they can choose to use a move from the Pokémon already on the field or switch to another member of their team. Moves can deal damage to the opponent, eventually causing it to faint, until the last player with active Pokémon wins.

Pokémon Battle Diagram

As an AI benchmark, Pokémon is most defined by:

  1. Team Design: Teams are created by choosing six species from the hundred(s) that are available. Each Pokémon then needs four moves, an item, an ability, and custom statistics. Players design teams to counter common trends and then design new teams to counter those counters, and so on. The process of selecting a team and then battling with it is a challenging two-stage optimization problem, and the game is always evolving.
  2. Generalization: Diverse team combinations create an incredibly wide range of initial states, and each matchup its own strategic puzzle. Agents have to learn to adapt their strategy by weighing the strengths and weaknesses of their own team against the threats and opportunities presented by their opponent.
  3. Stochasticity: Turn outcomes are random; there are a lot of things that could happen after each move, and a single turn can make or break a battle. The better player does not always win. In fact, the very best players are (only) 75–90% favorites against a randomly sampled player, depending on the ruleset.
  4. Imperfect Information and Opponent Prediction: Battles revolve around team information that has or has not been revealed to the opponent. Inferring unrevealed Pokémon/items/moves can be a major advantage but requires detailed understanding of current team design trends. Pokémon is a simultaneous-move game and the value of each action is highly dependent on the opponent's decision. Listen to any good player discussing their thought process during a battle and the main thing you'll notice is how much time they spend on team inference and move prediction. Here's one example (no need to watch more than a minute to get the idea): example commentary.
  5. Datasets: Between its battle replays, team design stats, forums, and wikis, Pokémon is a goldmine of naturally occurring training data.

Let's Play

The best way to get a feel for the problem is to play a battle yourself! It takes under a minute to get into a match against bots on the PokéAgent Ladder. It's a fun way to play low-stakes battles against opponents who won't keep you waiting or talk trash when you lose :)

1) Copy a Sample Team

Click to view a sample Gen 1 OU team to copy-paste
Alakazam
EVs: 252 HP / 252 Def / 252 SpA / 252 SpD / 252 Spe
IVs: 2 Atk
- Thunder Wave
- Seismic Toss
- Psychic
- Recover

Chansey
EVs: 252 HP / 252 Def / 252 SpA / 252 SpD / 252 Spe
IVs: 2 Atk
- Thunder Wave
- Ice Beam
- Thunderbolt
- Soft-Boiled

Gengar
- Hypnosis
- Thunderbolt
- Seismic Toss
- Explosion

Snorlax
- Body Slam
- Earthquake
- Hyper Beam
- Self-Destruct

Tauros
- Body Slam
- Earthquake
- Hyper Beam
- Blizzard

Starmie
EVs: 252 HP / 252 Def / 252 SpA / 252 SpD / 252 Spe
IVs: 2 Atk
- Thunder Wave
- Blizzard
- Psychic
- Recover

2) Open the PokéAgent Ladder

3) Follow This Short Video


How to start a battle on the PokéAgent ladder

All You Need to Know About Pokémon Showdown

Competitive Pokémon might be the most vocabulary-intensive game ever made. There are a lot of Named Things™ to know about (there are more than 1,000 Pokémon, just for starters). The terminology can be a bit overwhelming, but the starter resources use few (if any) Pokémon-specific heuristics and are aimed at an ML audience. However, there are a few vocabulary terms you'll need to know to follow their instructions and conversations on Discord:

  • Pokémon Showdown (or just "Showdown") is a popular online Pokémon battle simulator (play.pokemonshowdown.com). It is an open-source platform, and we use it to host the Battling evaluations.
  • Generations (or "gen") refer to major releases of the Pokémon video game franchise. There are currently nine generations ("Gen 1", ... , "Gen 9"). Each generation adds new Pokémon and changes the gameplay mechanics ("what exactly happens when I pick this move?"). Showdown simulates battles for each generation.
  • Tiers are how Showdown excludes Pokémon that are overpowered so that gameplay stays balanced and varied. Showdown has a good article about the tiering system. PokéAgent only features the flagship tier "OverUsed" (or "OU"). All you need to know is that a list of <60 Pokémon are powerful enough to only be allowed in OU, and these are the most common choices you'll run into. However, Pokémon from lower tiers ("UnderUsed", "NeverUsed", etc.) are allowed and have niche roles.
  • Battle Format (or just "Format") refers to a combination of the generation a battle takes place, the tier of Pokémon that are allowed, and any additional rules. Because we are only considering OU, you'll mainly be seeing terms like "Gen1OU", "Gen2OU", ... "Gen9OU". Think of each battle format as a distinct game with some high-level similarities.
  • VGC refers to the "doubles" battle format officially endorsed by Pokémon (design 6 Pokémon, pick 4, play 2 vs. 2). Despite shorter battle lengths, VGC is widely considered more complicated than singles because there are far more potential actions per turn.
  • Elo, GXE, and Glicko-1 are player skill metrics. Elo and Glicko-1 are not unique to Pokémon. Showdown's matchmaking tends to pair people with similar Elo ratings. GXE corrects for matchmaking bias and approximates your % chance of defeating a randomly sampled opponent. Here is a Showdown article with more information.
  • Ladders are the ranked battle system on Showdown. You request a battle, match against an unknown player, and play a single game. Showdown maintains a global leaderboard and updates both players' ratings after the battle. There are separate ladders for each battle format.

What Do the Different Battle Formats Mean for AI?

Battle Formats Summary

Gen1OU and Gen9OU cover both ends of a few important trends:

Team Design Space

Trend: Every generation adds Pokémon, moves, and other team design choices. The number of available team compositions dramatically increases over the generations.

AI Takeaways: Agents must generalize over more diverse team choices in later generations. Expect later generations to require more data and stronger representations to reach the same performance.

Data Availability

Trend: The latest generation (currently: Gen 9) is by far the most popular. There are more Gen 9 battles played per day than every other generation combined.

AI Takeaways: Available replay data conveniently increases alongside the previously mentioned demand for more data.

Planning Horizon

Trend: Offensive power has increased over time. The average length of a battle drops sharply over the early generations, then mostly levels out.

AI Takeaways: Planning horizons decrease over generations, but it becomes harder to recover from mistakes (or bad luck). Search may be more useful in later gens.

Imperfect Information

Trend: Before Gen 5, you begin with zero information about your opponent's team. From Gen 5 onward, Showdown reveals the opponent's Pokémon before the battle begins ("Team Preview").

AI Takeaways: Gen 1–4 emphasize opponent team prediction. Team Preview weakens the otherwise obvious trend that more team combinations leads to more imperfect information.

Non-Stationarity

Trend: Turning Pokémon into a balanced competitive strategy game is hard and requires frequent rule changes, especially in the first few months after a new generation is released.

AI Takeaways: Manual rule changes and evolving strategies create non-stationary datasets. If you are imitating a replay from 2015, you are imitating the decisions of a player who thought they were up against a different set of teams and strategies than you'd see on the ladder today.

Learning More About Pokémon

  • Smogon and its forums have introductory articles and strategy guides for every generation. See Gen1OU and Gen9OU.
  • Academic papers about Pokémon AI cannot assume readers are Pokémon players, so their background sections can offer a more formal introduction. Examples include PokéChamp, Metamon, and the PokéAgent Challenge whitepaper.
  • Stumble across a vocabulary word you don't recognize? Search it on Bulbapedia.