2026-03-10 · OpenAI

Improving instruction hierarchy in frontier LLMs

securityagentsmodelsresearch

Improving instruction hierarchy in frontier LLMs

Source: OpenAI Date: 2026-03-10 URL: https://openai.com/index/instruction-hierarchy-challenge

Summary

OpenAI research post from March 2026 on the “instruction hierarchy challenge” in frontier LLMs — the problem of reliably respecting the priority ordering between system prompt instructions, user instructions, and tool outputs when they conflict. In production agentic applications, models frequently encountered conflicting instructions from different sources, and inconsistent handling created security vulnerabilities (prompt injection), reliability failures, and policy circumvention. The research documented methods for improving how models resolved these conflicts predictably.

Implications

Instruction hierarchy is a security primitive. If a model couldn’t reliably prioritize system prompt instructions over user inputs or external tool results, any operator-defined safety constraint could be overridden by a sufficiently crafted user message or malicious tool response. Improving instruction hierarchy was thus not just a reliability issue but a foundational security property for production deployments.

Thread: Agentic AI security and reliability. Sits alongside the prompt injection safety post (January 2026), the Codex security blog, and the Trusted Access for Cyber work as OpenAI’s research into the security properties of agentic systems.

Watch: Whether the instruction hierarchy improvements described were deployed in GPT-5.4 and subsequent models, and how they were evaluated — particularly against adversarial inputs designed to override system prompt constraints.

← all signals