3D Virtual World AI Agent ''SIMA 2'' That Thinks and Learns Like Humans

Google DeepMind has unveiled "SIMA 2" (Scalable Instructable Multiworld Agent 2), the second version of an AI agent that sees, moves, and learns in 3D games like a person. This version has evolved beyond simply following commands like "turn left" or "climb the ladder" to become a "gaming partner AI" that understands goals, makes its own plans, and improves its capabilities over time. DeepMind describes SIMA 2 as "an important advance toward Artificial General Intelligence (AGI) and a core testing ground for 'embodied intelligence' that will expand to robots and the physical world."

SIMA 1 was a general-purpose language-action agent that could perform over 600 basic actions in multiple commercial games without accessing game code or APIs — using screens as "eyes" and virtual keyboard/mouse as "hands." SIMA 2 goes a step further, integrating Google's Gemini model at the agent's core, designed to reason not just about following instructions but "why this action should be taken and in what order to achieve the goal."

SIMA 2 can understand a user's high-level goal, design the sub-steps needed to achieve it, observe the game environment, and translate to actual keyboard/mouse actions. Training was evolved using data combining human play videos with language descriptions and behavior/explanation labels generated by Gemini. As a result, SIMA 2 can explain in natural language "what it's trying to do and why it chose this sequence."

Another key feature is generalization ability. While previous versions saw performance drop sharply outside trained game ranges, SIMA 2 showed meaningful performance in: Viking survival game ASKA, Minecraft research version, and games featuring open-world exploration and resource collection. DeepMind also introduced a self-improvement cycle: Gemini provides initial tasks and estimated rewards, SIMA 2 learns from generated experience in subsequent generations, enabling improvement on previously failed tasks entirely independently of human demonstrations and intervention. The researchers evaluate that interacting with SIMA 2 feels less like "giving commands to AI" and more like "discussing while playing together."