2025-10-29 · OpenAI

Introducing gpt-oss-safeguard

models

Introducing gpt-oss-safeguard

Source: OpenAI Date: 2025-10-29 URL: https://openai.com/index/introducing-gpt-oss-safeguard

Summary

OpenAI’s announcement of gpt-oss-safeguard, a safety classifier model designed to work alongside the open-source gpt-oss models. The safeguard model detects harmful outputs, refusals, and policy violations, allowing developers who deploy gpt-oss weights to add OpenAI-calibrated safety filters without building their own. Released alongside gpt-oss (120B and 20B) in August 2025.

Implications

Open-weight/safety thread. The safeguard model is OpenAI’s answer to the “open weights without safety” concern. By providing a companion classifier, OpenAI can maintain that responsible open-weight release is possible — the safeguard becomes the accountability artifact. This is also a competitive move: organizations that want to deploy open weights can use OpenAI’s safety layer rather than building their own (or using Meta’s Llama Guard), keeping them in the OpenAI ecosystem.

← all signals