2024-10-22 · Anthropic

Developing a computer use model

securityagentsmodels

Developing a computer use model

Source: Anthropic Date: 2024-10-22 URL: https://www.anthropic.com/news/developing-computer-use

Summary

Anthropic detailed the technical development of Claude’s computer use capability: trained on simple software (calculators, text editors — no internet access for safety), using screenshots to reason about cursor positions via pixel-counting. Model generalized rapidly beyond the narrow training distribution. Current OSWorld score: 14.9% (vs. 7.7% next-best model, 70-75% human). Limitations: slow, error-prone, no drag/zoom. Safety classification: ASL-2, no increased frontier risks identified. Primary concern: prompt injection attacks.

Implications

Safety/research / capability development transparency. Publishing the training methodology for computer use is unusual — it details the specific safety decisions (no internet access during training) and the generalization behavior (surprised by rapid generalization). The transparency is partly risk communication: Anthropic is setting expectations that this is early-stage capability.
Pixel-counting as fundamental. The detail that cursor positioning required learning pixel-counting is a concrete example of how seemingly-simple GUI interactions require non-obvious capabilities. It contextualizes why 14.9% on OSWorld is actually impressive given the training data constraints.
Prompt injection as the primary risk. Naming prompt injection (not jailbreak, not misuse) as the primary computer use safety concern is technically precise — an agent that can execute computer actions is uniquely vulnerable to adversarial content on the screen instructing it to take unintended actions.
Watch: how the training approach evolved in subsequent versions; whether the prompt injection risk materialized in practice; the trajectory from 14.9% toward human-level OSWorld performance.

← all signals