2026-03-25 · Anthropic

Claude Code auto mode: a safer way to skip permissions

securityagentsmodels

Claude Code auto mode: a safer way to skip permissions

Source: Anthropic Engineering Date: 2026-03-25 URL: https://www.anthropic.com/engineering/claude-code-auto-mode

Summary

Claude Code auto mode addresses the 93% permission approval rate (approval fatigue) with a two-layer classifier: a prompt-injection probe on tool outputs (input layer) and a transcript classifier that evaluates actions before execution using a fast single-token filter + chain-of-thought on flagged actions (output layer). The classifier deliberately strips assistant reasoning from its context to prevent the agent from influencing its own classification. Trade-off: 0.4% false-positive rate accepted alongside a 17% false-negative rate on real risky actions.

Implications

The Claude Code security thread. This pairs directly with the sandboxing post — sandboxing reduces the blast radius of any individual action, auto mode’s classifier prevents the worst actions from executing. Together they’re the two-layer safety architecture for autonomous Claude Code operation.

The 93% approval rate is the key data point. Permission prompts are functionally no-ops when users approve 93% of them. This validates the design decision to move safety enforcement from UX friction to automated classification — humans are not reliable reviewers for high-frequency, low-context permission requests.

17% false-negative rate is significant. One in six genuinely risky actions gets through. Anthropic explicitly targets this system at --dangerously-skip-permissions users rather than high-stakes infrastructure — the safety guarantee is “better than nothing” for automation contexts, not a guarantee of safety. This is an honest trade-off disclosure and a useful calibration point for shops considering auto mode in production.

← all signals