2025-10-09 · Anthropic

A small number of samples can poison LLMs of any size

ecosystem

read at source ↗ www.anthropic.com

A small number of samples can poison LLMs of any size

Source: Anthropic Research Date: 2025-10-09 URL: https://www.anthropic.com/research/small-samples-poison

Summary

Backdoor attacks on LLMs (600M–13B parameters) require a near-constant absolute number of poisoned documents regardless of model size. 250 malicious documents with a trigger phrase (<SUDO>) successfully implant a backdoor in a 13B model training on 20x more data than the smallest model tested. Attack success measured by output perplexity on trigger.

Implications

This is the training data security thread with a stark implication: data poisoning doesn’t get harder as models scale. If you’re thinking “my trillion-token dataset dilutes any attack,” you’re wrong — 250 documents is the constant. This directly motivates data pipeline security as a non-negotiable at every scale. The absolute-count framing shifts defense strategy: you need to audit for specific trigger patterns, not just monitor data proportions. Connects to the Persona Vectors work on training data filtering — if you can’t detect the 250 poisoned samples before training, you’ll need post-hoc detection methods. Watch for this feeding into enterprise fine-tuning guardrails and supply chain security for training data vendors.

← all signals