2025-10-23 · Google

Strengthening our Frontier Safety Framework

models

read at source ↗ deepmind.google

Strengthening our Frontier Safety Framework

Source: DeepMind Date: 2025-10-23 URL: https://deepmind.google/blog/strengthening-our-frontier-safety-framework/

Summary

Google DeepMind published FSF 3.1, the third iteration of its Frontier Safety Framework, introducing a new Critical Capability Level (CCL) for “harmful manipulation” — AI models that could systematically alter beliefs and behaviors at scale. Added Tracked Capability Levels (TCLs) for earlier detection of sub-critical risks, with an April 2026 tier. Expanded misalignment protocols cover deceptive instrumental reasoning, model interference with shutdown, and unsupervised acceleration of AI R&D.

Implications

Harmful manipulation as a new CCL is the policy marker. Adding AI persuasion/manipulation as a critical safety level alongside bioweapons and cyberweapons puts behavioral influence on the same governance tier as existential risks. That’s a significant policy escalation — it will shape how regulators think about social-media-scale AI deployment.

FSF versioning pace reflects model release pace. FSF 1.0 → 2.0 → 3.1 in roughly 12 months is fast for safety governance. The versioning cadence is keeping pace with the Gemini model release cadence, which means Google is at minimum trying to show that safety frameworks evolve with capabilities. Whether they’re actually effective is a separate question.

TCLs for April 2026 suggest pre-deployment concern. Adding a tier specifically dated April 2026 implies DeepMind has modeled that upcoming models (likely Gemini 3 or post-3.1) may trigger new capability thresholds that need monitoring before they reach CCL status. That’s rare explicit forward-looking safety governance.

Watch:

  • Whether the harmful manipulation CCL leads to any actual capability gating in Gemini releases
  • EU AI Act alignment — the CCL framework maps well to the Act’s prohibited/high-risk classification structure
  • Anthropic RSP version history as a comparison — are safety frameworks converging or diverging across labs?

← all signals