A small number of samples can poison LLMs of any size
read at source ↗ www.anthropic.com
A small number of samples can poison LLMs of any size
Source: Anthropic Research Date: 2025-10-09 URL: https://www.anthropic.com/research/small-samples-poison
Summary
Backdoor attacks on LLMs (600M–13B parameters) require a near-constant absolute number of poisoned documents regardless of model size. 250 malicious documents with a trigger phrase (<SUDO>) successfully implant a backdoor in a 13B model training on 20x more data than the smallest model tested. Attack success measured by output perplexity on trigger.
Implications
This is the training data security thread with a stark implication: data poisoning doesn’t get harder as models scale. If you’re thinking “my trillion-token dataset dilutes any attack,” you’re wrong — 250 documents is the constant. This directly motivates data pipeline security as a non-negotiable at every scale. The absolute-count framing shifts defense strategy: you need to audit for specific trigger patterns, not just monitor data proportions. Connects to the Persona Vectors work on training data filtering — if you can’t detect the 250 poisoned samples before training, you’ll need post-hoc detection methods. Watch for this feeding into enterprise fine-tuning guardrails and supply chain security for training data vendors.