2025-04-02 · Google

Taking a responsible path to AGI

modelsenterpriseresearch

Taking a responsible path to AGI

Source: DeepMind Date: 2025-04-02 URL: https://deepmind.google/blog/taking-a-responsible-path-to-agi/

Summary

Google DeepMind published a position paper on AGI safety governance covering four risk categories: misuse (restricting dangerous capabilities including cyberattack enablement), misalignment (specification gaming, deceptive alignment), accidents, and structural risks. Proposed solutions include amplified oversight via AI systems, MONA (Myopic Optimization with Nonmyopic Approval), interpretability research, and monitoring for misaligned actions. Governance structures: AGI Safety Council, internal review, and partnerships with Apollo, Redwood Research, and Frontier Model Forum.

Implications

MONA (Myopic Optimization with Nonmyopic Approval) is the technical proposal worth tracking. Most AGI safety discussions are philosophical; MONA is a proposed concrete architecture for ensuring long-term planning remains human-understandable. Whether it’s viable at frontier model scale is a research question — but it’s at least specific enough to evaluate and critique.

“Amplified oversight using AI systems to evaluate other AI responses” is the Constitutional AI / RLAIF pattern formalized as safety policy. DeepMind is naming this as a safety technique for AGI, not just a training efficiency trick. That formalization matters: it means the AI-critiques-AI approach is being planned for frontier model safety, not just Gemini quality.

External partnerships (Apollo, Redwood, FMF) signal DeepMind can’t do this alone. Listing third-party safety organizations as partners acknowledges that internal safety teams are insufficient — independent evaluation and red-teaming from specialized organizations is required. That’s a more credible posture than self-certification.

Watch:

Whether the AGI Safety Council has any disclosed decision-making power over model releases
MONA research publications — does the architecture scale to frontier model size?
How the misuse category (cyberattack capabilities) maps to actual capability gating in Gemini 2.5/3 releases

← all signals