Introducing the Gemini 2.5 Computer Use model
read at source ↗ deepmind.google
Introducing the Gemini 2.5 Computer Use model
Source: DeepMind Date: 2025-10-23 URL: https://deepmind.google/blog/introducing-the-gemini-25-computer-use-model/
Summary
Google released a specialized Gemini 2.5 Pro-based computer use model for UI interaction (click, type, scroll) without structured APIs. On Browserbase Online-Mind2Web: 70%+ accuracy at ~225 sec latency — highest quality and lowest latency among tested alternatives. Benchmarked on Online-Mind2Web, WebVoyager, and AndroidWorld. In production: powering Project Mariner, Firebase Testing Agent, and Search AI Mode agentic features. Browser and mobile-optimized; desktop OS control unoptimized.
Implications
70%+ on Online-Mind2Web with lowest latency is the integrators’ number. Web automation at 70% accuracy with 225-second latency is production-usable for many agent tasks — form filling, research workflows, software testing. The latency-quality combination being better than alternatives is what matters for Cursor, Firebase, and agentic Search.
Firebase Testing Agent is the developer distribution play. Embedding computer use in Firebase (Google’s primary app development platform) means Android and web developers get UI testing automation as a built-in service. That’s the same pattern as GitHub Copilot integrating into the developer workflow — capture the tool, not just the API.
Browser/mobile optimized, desktop not yet. The explicit carve-out for desktop OS control signals this is targeted at web and mobile automation markets first. Desktop computer use (macOS/Windows control) is Claude’s computer use territory; Google is competing on the web/mobile axis. Watch whether desktop support ships in 2026.
Watch:
- Online-Mind2Web accuracy improvements over time — 70% is useful but 85%+ is where autonomous web agents become genuinely reliable
- Whether Project Mariner’s multi-task parallelism (10 simultaneous tasks) uses this model as the underlying UI controller
- Anthropic and OpenAI computer use benchmark responses on the same evaluation suite