2025-11-24 · Anthropic

Mitigating the risk of prompt injections in browser use

securityagentsmodelsenterprise

read at source ↗ www.anthropic.com

Mitigating the risk of prompt injections in browser use

Source: Anthropic Research Date: 2025-11-24 URL: https://www.anthropic.com/research/prompt-injection-defenses

Summary

Claude Opus 4.5 in browser agent contexts evaluated against a Best-of-N attacker with 100 attempts per environment. Three-layer defense: RL during training, improved injection classifiers, and human red-teaming. Result: 1% attack success rate, down significantly from earlier versions. Authors flag that 1% still represents meaningful risk at deployment scale.

Implications

This is the agent safety / prompt injection thread in the real-world browser use context. 1% sounds good until you model what 1% means when millions of browser agent sessions run daily — that’s thousands of successful attacks. The three-layer defense is the right architecture but not yet sufficient. This is likely the current state-of-the-art for commercial browser agents. The “meaningful risk” acknowledgment in the paper itself is notable — Anthropic is shipping this while publicly flagging it’s not solved. Watch for Claude for Chrome’s attack surface as a persistent target and for injection defense becoming a formal capability threshold in Anthropic’s scaling policy.

← all signals