Activating AI Safety Level 3 protections
securitymodelsenterprise
read at source ↗ www.anthropic.com
Activating AI Safety Level 3 protections
Source: Anthropic Date: 2025-05-22 URL: https://www.anthropic.com/news/activating-asl3-protections
Summary
Anthropic activated ASL-3 protections for Claude Opus 4 on May 22, 2025 — the first time the elevated safety level was triggered. Trigger: continued CBRN capability improvements made it impossible to “clearly rule out ASL-3 risks.” Deployment protections: Constitutional Classifiers (real-time CBRN input/output monitoring), jailbreak bug bounty expansion, synthetic jailbreak generation. Security controls: 100+ combined prevention/detection controls, two-party authorization for model weight access, egress bandwidth controls, enhanced endpoint software controls.
Implications
- Safety/policy / RSP activated for real. The Responsible Scaling Policy (first published 2023, updated to v2 in 2024 and v3 in 2026) was always a theoretical framework — ASL-3 activation is the first time it triggered actual operational changes. The precautionary framing (“cannot clearly rule out”) is important: Anthropic didn’t claim the model crossed the threshold, but treated the uncertainty as requiring the elevated controls.
- Constitutional Classifiers as real-time defense. The deployment of Constitutional Classifiers as a real-time filter at ASL-3 is a specific architectural decision — it’s a model-based filter on top of the model, not a rule-based filter. This creates a meta-level safety layer that can be updated independently of the underlying model.
- Two-party authorization for weights. Activating weight-access controls at ASL-3 is operationally significant — it means model weight access requires a second authorization, which changes internal operational dynamics and makes exfiltration harder.
- Code with Claude conference same day. ASL-3 activation coincided with the Code with Claude developer conference (May 22, 2025). Anthropic signaled capability and safety simultaneously — the highest-capability developer-facing launch paired with the highest public safety posture.
- Watch: when and whether ASL-4 protections are defined and activated; how the Constitutional Classifier performance evolved; whether the two-party weight authorization slowed internal operations.