2026-02-24 · Anthropic

Responsible Scaling Policy Version 3.0

securitymodels

Responsible Scaling Policy Version 3.0

Source: Anthropic Date: 2026-02-24 URL: https://www.anthropic.com/news/responsible-scaling-policy-v3

Summary

Anthropic published RSP v3.0 on February 24, 2026, acknowledging that v2’s theory of change “partially failed”: capability thresholds proved too ambiguous for industry consensus, and the geopolitical climate shifted toward competitiveness. The revision splits the policy into two tracks — unilateral commitments Anthropic will implement regardless, and broader recommendations requiring multilateral action. New mechanisms: a public Frontier Safety Roadmap (covering Security, Alignment, Safeguards, Policy), Risk Reports published every 3–6 months with minimal redactions, and third-party external review when warranted.

Implications

The explicit admission that v2 failed to drive industry consensus is significant — it signals Anthropic has abandoned the assumption that publishing a rigorous RSP would create normative pressure on competitors. The pivot to “realistic unilateral commitments” is a strategic retreat from the idea that safety governance can be self-coordinating.
Risk Reports published every 3–6 months with external review are a transparency commitment with real teeth if held to: they create an auditable record of what Anthropic knew about capability risks at each release cycle, which matters for accountability as models approach higher ASL thresholds.
The dual-track structure (unilateral vs. multilateral) implicitly acknowledges that the multilateral track requires regulatory or governmental forcing functions — this reads as pre-positioning for whatever policy frameworks emerge from EU AI Act implementation, the Five Eyes guidance, and U.S. executive action.
The Frontier Safety Roadmap’s inclusion of insider threat monitoring and automated red-teaming as graded goals moves safety from policy abstraction to engineering deliverables — worth tracking whether these show up in subsequent Risk Reports as actually shipped.

← all signals