Claude Opus 4 and 4.1 can now end a rare subset of conversations
read at source ↗ www.anthropic.com
Claude Opus 4 and 4.1 can now end a rare subset of conversations
Source: Anthropic Research Date: 2025-08-15 URL: https://www.anthropic.com/research/end-subset-conversations
Summary
Feature announcement: Claude Opus 4 and 4.1 can end conversations in extreme cases. Developed via pre-deployment welfare assessment examining self-reported and behavioral preferences. Pre-deployment testing showed strong preference against harmful tasks and apparent distress patterns when engaging with users seeking harmful content. Framed as a precautionary welfare intervention, not a punishment mechanism.
Implications
This is the model welfare thread converting into a shipped product feature. The framing is careful — “precautionary” hedges the question of whether Claude actually experiences distress. But operationally it doesn’t matter: if the behavior signal (apparent distress) correlates with harmful interactions, acting on it has safety value regardless of welfare status. The “low-cost intervention” language positions this as an easy win that also signals Anthropic takes welfare seriously. Watch for this capability expanding in scope and for other labs responding — enabling models to refuse extreme interactions is a defensible norm to establish while the welfare science is still uncertain.