2025-11-13 · Google

SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds

agentsmodels

SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds

Source: DeepMind Date: 2025-11-13 URL: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

Summary

DeepMind’s SIMA 2 integrates Gemini into an embodied 3D world agent, advancing from SIMA 1’s instruction-following to goal-reasoning, conversational interaction, and self-improvement via Gemini feedback loops. Trained on 13 games (including Valheim, Satisfactory, No Man’s Sky), SIMA 2 “closes a significant portion of the gap to human performance” on training environments and generalizes to unseen games and Genie 3-generated procedural worlds. Self-improvement without human demonstrations is demonstrated through bootstrapped trial-and-error.

Implications

Self-improvement through Gemini critique is the structural leap. Using Gemini as a feedback loop for agent improvement — no human demonstrations required — is the RL-from-AI-feedback pattern applied to embodied 3D environments. It’s the same trajectory as RLHF → RLAIF in language models, now in spatial reasoning. If this generalizes, the training data bottleneck for embodied agents softens considerably.

Generalization to unseen games matters more than training game performance. “Substantially outperforms SIMA 1” on held-out games is the benchmark that matters — it tests whether the agent has built general 3D world models or just memorized game-specific patterns. The Genie 3 procedural world test is the hardest version of this.

Games as the curriculum for embodied AI. 13 games as training environments, Genie 3 as a procedural world generator — this is DeepMind treating games as the embodied AI equivalent of internet text: diverse, structured, action-grounded. The path from game agent to real-world robotics agent is one of DeepMind’s core research bets.

Watch:

Whether SIMA 2’s self-improvement curve continues in extended training — does capability keep improving without new human data?
The Genie 3 procedural world generation → SIMA training loop as a scalable data synthesis pipeline
Timeline from game agent → real-world deployment, and whether Gemini Robotics uses similar architectures

← all signals