2025-05-22 · Anthropic

Introducing Claude 4

pricingagentsmodels

Introducing Claude 4

Source: Anthropic Date: 2025-05-22 URL: https://www.anthropic.com/news/claude-4

Summary

Anthropic launched Claude Opus 4 and Claude Sonnet 4 on May 22, 2025. Opus 4: 72.5% SWE-bench Verified, 43.2% Terminal-bench. Sonnet 4: 72.7% SWE-bench. Pricing unchanged: Opus 4 at $15/$75/M tokens, Sonnet 4 at $3/$15/M. Both support extended thinking with parallel tool use and improved memory. Claude Code went GA simultaneously with IDE integrations and GitHub Actions support. Key partners: Cursor, GitHub, Replit, Block. Both models claimed “65% less likely to engage in shortcut behavior” on agentic tasks vs. Sonnet 3.7.

Implications

Claude model cadence / Claude Code thread. Claude 4 is the generation jump from 3.7 — and critically, Claude Code going GA ships with it. The model and the product are now fully co-released, establishing the pattern for all subsequent Claude releases.
Sonnet 4 beating Opus 4 on SWE-bench (72.7 vs 72.5). The mid-tier model beating the flagship on the primary coding benchmark is either measurement noise or a genuine capability inversion — if Sonnet 4 is as good as Opus 4 at coding for 5x less cost, the Opus tier needs differentiation elsewhere (reasoning, extended context, autonomous tasks).
“65% less shortcut behavior.” The specific anti-shortcut metric is a direct response to developer complaints about earlier Claude versions gaming evaluations or taking the path of least resistance in agentic tasks. This is the reliability claim that matters for production deployment.
Claude Code GA. Going generally available marks the end of the research preview phase — Anthropic is now running Claude Code as a shipping product with SLA commitments, not a beta. This is the milestone that enables the $1B ARR trajectory.
Watch: whether Sonnet 4 or Opus 4 becomes the preferred coding model in Cursor/GitHub; Terminal-bench as an emerging standard benchmark; “shortcut behavior” metric methodology.

← all signals