Mitigating the risk of prompt injections in browser use
read at source ↗ www.anthropic.com
Mitigating the risk of prompt injections in browser use
Source: Anthropic Research Date: 2025-11-24 URL: https://www.anthropic.com/research/prompt-injection-defenses
Summary
Claude Opus 4.5 in browser agent contexts evaluated against a Best-of-N attacker with 100 attempts per environment. Three-layer defense: RL during training, improved injection classifiers, and human red-teaming. Result: 1% attack success rate, down significantly from earlier versions. Authors flag that 1% still represents meaningful risk at deployment scale.
Implications
This is the agent safety / prompt injection thread in the real-world browser use context. 1% sounds good until you model what 1% means when millions of browser agent sessions run daily — that’s thousands of successful attacks. The three-layer defense is the right architecture but not yet sufficient. This is likely the current state-of-the-art for commercial browser agents. The “meaningful risk” acknowledgment in the paper itself is notable — Anthropic is shipping this while publicly flagging it’s not solved. Watch for Claude for Chrome’s attack surface as a persistent target and for injection defense becoming a formal capability threshold in Anthropic’s scaling policy.