2024-10-15 · Anthropic

Announcing our updated Responsible Scaling Policy

agentsenterpriseresearch

Announcing our updated Responsible Scaling Policy

Source: Anthropic Date: 2024-10-15 URL: https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy

Summary

Anthropic published RSP v2 (October 2024), introducing a more flexible risk governance framework. Two new explicit capability thresholds that trigger enhanced safeguards: autonomous AI research and development, and CBRN weapons assistance. Implemented routine capability and safeguard assessments using safety case methodology. Acknowledged minor procedural shortfalls from the prior RSP. Co-founder Jared Kaplan named as Responsible Scaling Officer; Head of Responsible Scaling role opened.

Implications

Safety/policy posture thread. RSP v2 after one year of implementation is the first significant revision — incorporating “practical insights” means real gaps were found in the original. The acknowledgment of “falling short of previous requirements in minor procedural ways” is a rare piece of self-criticism in a public document.
Two capability thresholds codified. Autonomous AI R&D and CBRN assistance as the two named ASL-3 triggers are now explicit rather than implied — this creates a clearer test Anthropic and external evaluators can apply. It also limits Anthropic’s flexibility to define the threshold post-hoc.
Jared Kaplan as RSO. Kaplan (co-founder, scaling laws pioneer) as Responsible Scaling Officer is a significant internal governance signal — the person most knowledgeable about what scaling produces is now accountable for when that scaling triggers safety escalation.
Safety case methodology. Importing safety case methodology from high-stakes engineering (nuclear, aviation) into AI scaling decisions is a substantive methodological choice — it requires explicit arguments that safety is maintained, not just absence of identified harms.
Watch: whether the “minor procedural shortfalls” are disclosed in detail; how the autonomous AI R&D threshold is defined operationally; whether Kaplan’s RSO role produces public communications about specific model evaluations.

← all signals