2026-02-18 · Anthropic

Measuring AI agent autonomy in practice

agentsmodels

Measuring AI agent autonomy in practice

Source: Anthropic Research Date: 2026-02-18 URL: https://www.anthropic.com/research/measuring-agent-autonomy

Summary

Analysis of millions of human-agent interactions across Claude Code and public API. 99.9th percentile turn duration nearly doubled from under 25 to over 45 minutes between Oct 2025 and Jan 2026. Experienced users auto-approve 20% → 40% of actions and interrupt more often (5% → 9%). Claude initiates clarification stops 2x more than humans on complex tasks. Software engineering dominates (~50% of API usage); healthcare, finance, cybersecurity emerging.

Implications

Real-world measurement of the agent autonomy thread in production. The near-doubling of extreme turn duration in three months is the trend signal — agents are running longer and longer without interruption as users trust increases. The experienced-user auto-approval increase (20% → 40%) combined with more interruptions is the nuanced finding: advanced users delegate more but also catch more problems. Claude initiating clarification 2x more than humans on complex tasks is the safety-relevant result — model-initiated oversight is calibrated to task complexity in a way user behavior isn’t. Healthcare and finance emerging uses are the regulatory watch areas. This is the post-deployment monitoring infrastructure paper that the Trustworthy Agents post called for.

← all signals