Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku
models
read at source ↗ www.anthropic.com
Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku
Source: Anthropic Date: 2024-10-22 URL: https://www.anthropic.com/news/3-5-models-and-computer-use
Summary
Anthropic released an upgraded Claude 3.5 Sonnet with improved coding (SWE-bench Verified: 33.4%→49.0%), launched Claude 3.5 Haiku (matching Claude 3 Opus on many evaluations at faster speed), and announced computer use capability in public beta — allowing Claude to control a computer by viewing screenshots, moving cursors, clicking, and typing. Computer use available via Anthropic API, Amazon Bedrock, and Vertex AI. Early adopters: Replit, Canva, DoorDash, Cognition.
Implications
- Computer use as capability inflection. The ability to control a computer GUI (not just answer questions or call APIs) is the technical prerequisite for fully autonomous agentic workflows. Even at 14.9% on OSWorld (vs. 70-75% human), the direction of capability is clear.
- SWE-bench 49% as coding benchmark. The 49% score on SWE-bench Verified (real GitHub issues) was the highest published at time of release — establishing Claude as the leading coding model for autonomous software engineering tasks. This is the benchmark that drives the developer/coding-agent market positioning.
- Haiku matching Opus. Claude 3.5 Haiku matching Claude 3 Opus on many evaluations at Haiku pricing is the clearest demonstration of the capability-per-dollar improvement trajectory. Each generation’s cheaper tier exceeds the prior generation’s premium tier.
- Watch: OSWorld scores in subsequent Claude releases; how the Replit/Canva/DoorDash computer use integrations evolved; whether the “experimental” classification was ever removed.