2025-09-29 · Anthropic

Introducing Claude Sonnet 4.5

securitypricingagentsmodelsinfrastructure

read at source ↗ www.anthropic.com

Introducing Claude Sonnet 4.5

Source: Anthropic Date: 2025-09-29 URL: https://www.anthropic.com/news/claude-sonnet-4-5

Summary

Anthropic launched Claude Sonnet 4.5 at the same price as Sonnet 4 ($3/$15 per million tokens). Key benchmarks: SWE-bench Verified 77.2% (82.0% high compute), OSWorld 61.4% (up from 42.2% four months prior). New features: Claude Code checkpoints (instant rollback), native VS Code extension, context editing and memory tools for long agent operations, direct code execution and file creation, Claude Agent SDK for custom agent development. Alignment improvements: reduced sycophancy, deception, and power-seeking; enhanced prompt injection defenses. Released under ASL-3 protections.

Implications

  • Model capability / OSWorld 61.4% as inflection. OSWorld 61.4% (vs. 42.2% four months earlier and 14.9% at original computer use launch in October 2024) is the most dramatic benchmark improvement in the release — computer use went from “experimental curiosity” to “approaching useful” in under a year.
  • SWE-bench 77.2% as coding standard. The 77.2% on SWE-bench Verified (real GitHub issues) at 82.0% high compute means Claude is resolving ~4/5 real software engineering tasks given sufficient compute. This is the number that drives the “autonomous software engineer” product positioning.
  • Same price as Sonnet 4. Maintaining pricing while dramatically improving capability is the recurring Claude story — each generation’s mid-tier exceeds the previous generation’s top tier at the same or lower price.
  • Agent SDK launch. The Claude Agent SDK is the developer tool that makes Claude Sonnet 4.5 the default choice for building agentic applications — it’s the framework that abstracts multi-step tool use, memory, and context management.
  • ASL-3 for Sonnet tier. ASL-3 protections applying to Sonnet 4.5 (not just Opus) means the mid-tier model had reached capability levels requiring elevated safety controls — the safety threshold had dropped from the flagship to the standard tier.

← all signals