Speedrunning

Long-Horizon RPG Gameplay

RPG Speedrunning

This track challenges agents to complete a full Pokémon role-playing game (Pokémon Emerald) as quickly and efficiently as possible, navigating a massive, partially observable world with hundreds of NPCs and thousands of possible actions.

Long-horizon planning, efficient exploration, and strategic resource management are critical. Agents must balance immediate objectives with long-term strategic goals, making decisions that span thousands of timesteps while adapting to the unpredictable nature of RPG gameplay.

The speedrunning challenge pushes AI systems to their limits in sequential decision-making, requiring sophisticated planning algorithms and efficient resource management to achieve optimal completion times in complex, open-world environments.

Starter Kit

A real-time agent loop with modular components for perception (game frame recognition), planning & memory (long vs. short term goals, knowledge storage), and control (emulator action execution).

What's Included

  • Agent Scaffolding: Modular framework for building Pokémon Emerald speedrunning agents
  • Pokémon Emerald Wrapper: Custom emulator API for real-time game interaction
  • Baseline Implementation: Reference agent with VLM setup and basic planning
  • Evaluation Tools: Automated testing and performance measurement

Submission Guidelines

Submissions focus on achieving maximum game completion under time constraints. Your agent must interact exclusively through the custom Pokémon Emerald emulator API. Use any method, as long as the final action comes from a neural network.

Important: All submissions undergo anti-cheat verification to ensure fair evaluation, including validation of agent behavior and action logs.

Submission Requirements

  • Code Archive: Your agent implementation as a ZIP or TAR.GZ file including all dependencies and README
  • Action & State Logs: The submission.log and detailed logs generated by the starter kit during your agent's run
  • Methodology Description: A brief document describing your approach across five dimensions (State Information, Tools, Memory, Feedback, Fine-tuning)
  • Video Evidence: YouTube link to a screen recording showing the complete speedrun

Code Modification Policy

You are encouraged to modify, extend, or completely rewrite the starter kit code. The only requirement is that your submission includes valid logs generated by the starter kit's logging system.

Ranking Criteria

Rankings are determined by raw performance metrics — number of actions and time.

Primary Ranking Components

  • Milestone Completion: Percentage of game milestones accomplished (gym badges, story progression)
  • Completion Efficiency: Time and action count to achieve milestones
  • Reproducibility: Clear documentation and verifiable results

Novel Methods Welcome

While we provide a starter kit with an LLM-scaffolded approach, we encourage submissions using diverse methods: tool-augmented systems, reinforcement learning, purely text-based reasoning, hybrid architectures, and other innovative techniques.

Methodology Documentation

Teams document their methodology across five dimensions:

  • State Information (S): Raw pixels vs. parsed game state vs. privileged information
  • Tools (T): External tools during gameplay (web search, calculators, planning utilities)
  • Memory (M): Memory mechanisms beyond immediate context (vector DBs, knowledge graphs)
  • Feedback (F): Human or automated feedback during runs
  • Fine-tuning (Φ): Specialized training on Pokémon data