From hard refusals to safe-completions: toward output-centric safety training
read at source ↗ openai.com
From hard refusals to safe-completions: toward output-centric safety training
Source: OpenAI Date: 2025-08-07 URL: https://openai.com/index/gpt-5-safe-completions
Summary
Summary
OpenAI published research on a shift in safety training philosophy for GPT-5 — moving from “hard refusals” (the model refuses to engage with a topic at all) toward “safe completions” (the model completes the request in a way that is helpful but steers toward safe outputs). The output-centric approach prioritizes producing useful responses that don’t cause harm over blanket refusals that frustrate legitimate use cases.
Implications
Safety/alignment thread. The hard-refusal-to-safe-completions shift is one of the most consequential alignment philosophy changes in the GPT-5 era. Hard refusals are a crude instrument: they prevent some harmful completions but also block enormous amounts of legitimate use, generating the “over-refusal” criticism that plagued earlier ChatGPT versions. Output-centric safety — completing the request in a manner that is helpful but doesn’t provide uplift for harmful purposes — requires much more sophisticated model judgment about what the output actually enables. This research reflects the maturation of safety training from rule-following to contextual judgment, and it informed GPT-5’s substantially reduced refusal rate on legitimate queries while maintaining safety on genuinely dangerous requests.