2026-03-17 · Google

Measuring progress toward AGI: A cognitive framework

protocolsmodelsresearch

read at source ↗ deepmind.google

Measuring progress toward AGI: A cognitive framework

Source: DeepMind Date: 2026-03-17 URL: https://deepmind.google/blog/measuring-progress-toward-agi-a-cognitive-framework/

Summary

Google DeepMind published a cognitive science-grounded AGI measurement framework identifying 10 cognitive abilities (perception, generation, attention, learning, memory, reasoning, metacognition, executive functions, problem-solving, social cognition) and a three-stage evaluation protocol: benchmark AI, collect human baselines, map performance against human distributions. Launched a $200K Kaggle hackathon (March 17–April 16, 2026) targeting the five abilities with the largest evaluation gaps: learning, metacognition, attention, executive functions, and social cognition.

Implications

The five underrepresented cognitive abilities are the honest gaps in current AI evaluation. Metacognition, executive function, attention, learning, and social cognition are precisely where current frontier LLMs are weakest relative to humans — and where current benchmarks (MMLU, HumanEval, ARC) are poorest at measuring. Publishing this as a research framework is Google saying: the existing benchmark surface is inadequate for AGI progress measurement.

The $200K Kaggle prize is evaluation infrastructure investment. Crowdsourcing benchmark design is how you get diverse, harder-to-game evaluations faster than a single research team can produce them. This follows the ARC Prize pattern (a community-incentivized benchmark program) — Google is acknowledging that no single lab should design AGI evaluations.

The cognitive science framing is a deliberate choice over capability-based AGI definitions. Defining AGI in terms of human cognitive abilities (from psychology and neuroscience) rather than task performance or benchmark scores is a different research culture than OpenAI’s or Anthropic’s framing. It anchors the discussion in human baselines, not arbitrary capability thresholds.

Watch:

  • Hackathon winning submissions — do the new evaluations reveal capability gaps not visible in current benchmarks?
  • Whether the cognitive framework gets adopted by independent evaluation organizations (HELM, LMSYS) as a supplementary evaluation surface
  • Google’s own models performance on the new cognitive benchmarks once published

← all signals