Developing a computer use model
securityagentsmodels
read at source ↗ www.anthropic.com
Developing a computer use model
Source: Anthropic Date: 2024-10-22 URL: https://www.anthropic.com/news/developing-computer-use
Summary
Anthropic detailed the technical development of Claude’s computer use capability: trained on simple software (calculators, text editors — no internet access for safety), using screenshots to reason about cursor positions via pixel-counting. Model generalized rapidly beyond the narrow training distribution. Current OSWorld score: 14.9% (vs. 7.7% next-best model, 70-75% human). Limitations: slow, error-prone, no drag/zoom. Safety classification: ASL-2, no increased frontier risks identified. Primary concern: prompt injection attacks.
Implications
- Safety/research / capability development transparency. Publishing the training methodology for computer use is unusual — it details the specific safety decisions (no internet access during training) and the generalization behavior (surprised by rapid generalization). The transparency is partly risk communication: Anthropic is setting expectations that this is early-stage capability.
- Pixel-counting as fundamental. The detail that cursor positioning required learning pixel-counting is a concrete example of how seemingly-simple GUI interactions require non-obvious capabilities. It contextualizes why 14.9% on OSWorld is actually impressive given the training data constraints.
- Prompt injection as the primary risk. Naming prompt injection (not jailbreak, not misuse) as the primary computer use safety concern is technically precise — an agent that can execute computer actions is uniquely vulnerable to adversarial content on the screen instructing it to take unintended actions.
- Watch: how the training approach evolved in subsequent versions; whether the prompt injection risk materialized in practice; the trajectory from 14.9% toward human-level OSWorld performance.