2026-02-24 · Anthropic

Responsible Scaling Policy Version 3.0

securitymodels

read at source ↗ www.anthropic.com

Responsible Scaling Policy Version 3.0

Source: Anthropic Date: 2026-02-24 URL: https://www.anthropic.com/news/responsible-scaling-policy-v3

Summary

Anthropic published RSP v3.0 on February 24, 2026, acknowledging that v2’s theory of change “partially failed”: capability thresholds proved too ambiguous for industry consensus, and the geopolitical climate shifted toward competitiveness. The revision splits the policy into two tracks — unilateral commitments Anthropic will implement regardless, and broader recommendations requiring multilateral action. New mechanisms: a public Frontier Safety Roadmap (covering Security, Alignment, Safeguards, Policy), Risk Reports published every 3–6 months with minimal redactions, and third-party external review when warranted.

Implications

  • The explicit admission that v2 failed to drive industry consensus is significant — it signals Anthropic has abandoned the assumption that publishing a rigorous RSP would create normative pressure on competitors. The pivot to “realistic unilateral commitments” is a strategic retreat from the idea that safety governance can be self-coordinating.
  • Risk Reports published every 3–6 months with external review are a transparency commitment with real teeth if held to: they create an auditable record of what Anthropic knew about capability risks at each release cycle, which matters for accountability as models approach higher ASL thresholds.
  • The dual-track structure (unilateral vs. multilateral) implicitly acknowledges that the multilateral track requires regulatory or governmental forcing functions — this reads as pre-positioning for whatever policy frameworks emerge from EU AI Act implementation, the Five Eyes guidance, and U.S. executive action.
  • The Frontier Safety Roadmap’s inclusion of insider threat monitoring and automated red-teaming as graded goals moves safety from policy abstraction to engineering deliverables — worth tracking whether these show up in subsequent Risk Reports as actually shipped.

← all signals