2025-12-22 · OpenAI

Continuously hardening ChatGPT Atlas against prompt injection

securityagentsmodels

read at source ↗ openai.com

Continuously hardening ChatGPT Atlas against prompt injection

Source: OpenAI Date: 2025-12-22 URL: https://openai.com/index/hardening-atlas-against-prompt-injection

Summary

Title-only: OpenAI describes their ongoing work hardening “ChatGPT Atlas” — their internal browsing and tool-use agent architecture — against prompt injection attacks. Published December 2025, this is a companion to the November 2025 “Understanding prompt injections” educational post, shifting from explaining the problem to describing the defensive engineering. Atlas is the system that handles ChatGPT’s web search and tool use, making it the primary injection surface in production.

Implications

The agent security hardening thread. “Continuously hardening” signals this is an operational security practice, not a one-time fix — prompt injection in agentic systems is a moving target. The Atlas system processes untrusted web content, email, and document data on behalf of users, which is precisely the attack surface that makes injection viable. Publishing the hardening work builds developer trust in the safety of agent deployments while also sharing the defensive methodology.

Architecture-level injection defense. The technical interest here is whether OpenAI is implementing architecture-level defenses (privileged vs. unprivileged content channels, system prompt isolation) or just training-based mitigations (fine-tuning to resist injections). Architecture-level approaches are more durable; training-based approaches tend to be bypassed as attackers adapt. The specifics of Atlas’s hardening would inform the field’s approach to agent security.

← all signals