2025-08-07 · OpenAI

From hard refusals to safe-completions: toward output-centric safety training

modelsresearch

From hard refusals to safe-completions: toward output-centric safety training

Source: OpenAI Date: 2025-08-07 URL: https://openai.com/index/gpt-5-safe-completions

Summary

OpenAI published research on a shift in safety training philosophy for GPT-5 — moving from “hard refusals” (the model refuses to engage with a topic at all) toward “safe completions” (the model completes the request in a way that is helpful but steers toward safe outputs). The output-centric approach prioritizes producing useful responses that don’t cause harm over blanket refusals that frustrate legitimate use cases.

Implications

Safety/alignment thread. The hard-refusal-to-safe-completions shift is one of the most consequential alignment philosophy changes in the GPT-5 era. Hard refusals are a crude instrument: they prevent some harmful completions but also block enormous amounts of legitimate use, generating the “over-refusal” criticism that plagued earlier ChatGPT versions. Output-centric safety — completing the request in a manner that is helpful but doesn’t provide uplift for harmful purposes — requires much more sophisticated model judgment about what the output actually enables. This research reflects the maturation of safety training from rule-following to contextual judgment, and it informed GPT-5’s substantially reduced refusal rate on legitimate queries while maintaining safety on genuinely dangerous requests.

← all signals