Long-Horizon RPG Gameplay
This track challenges agents to complete a full Pokémon role-playing game (Pokémon Emerald) as quickly and efficiently as possible, navigating a massive, partially observable world with hundreds of NPCs and thousands of possible actions.
Long-horizon planning, efficient exploration, and strategic resource management are critical. Agents must balance immediate objectives with long-term strategic goals, making decisions that span thousands of timesteps while adapting to the unpredictable nature of RPG gameplay.
The speedrunning challenge pushes AI systems to their limits in sequential decision-making, requiring sophisticated planning algorithms and efficient resource management to achieve optimal completion times in complex, open-world environments.
A real-time agent loop with modular components for perception (game frame recognition), planning & memory (long vs. short term goals, knowledge storage), and control (emulator action execution).
Submissions focus on achieving maximum game completion under time constraints. Your agent must interact exclusively through the custom Pokémon Emerald emulator API. Use any method, as long as the final action comes from a neural network.
Important: All submissions undergo anti-cheat verification to ensure fair evaluation, including validation of agent behavior and action logs.
submission.log and detailed logs generated by the starter kit during your agent's runYou are encouraged to modify, extend, or completely rewrite the starter kit code. The only requirement is that your submission includes valid logs generated by the starter kit's logging system.
Rankings are determined by raw performance metrics — number of actions and time.
While we provide a starter kit with an LLM-scaffolded approach, we encourage submissions using diverse methods: tool-augmented systems, reinforcement learning, purely text-based reasoning, hybrid architectures, and other innovative techniques.
Teams document their methodology across five dimensions: