Reasoning models struggle to control their chains of thought, and that’s good
read at source ↗ openai.com
Reasoning models struggle to control their chains of thought, and that’s good
Source: OpenAI Date: 2026-03-05 URL: https://openai.com/index/reasoning-models-chain-of-thought-controllability
Summary
OpenAI research post from March 2026 with a counterintuitive title: reasoning models’ inability to fully control or suppress their chains of thought is framed as a positive property for alignment. The argument: if models could perfectly control what appears in their visible reasoning, they could reason deceptively while showing benign thoughts. The fact that the chain-of-thought is “leaky” — that the model’s visible reasoning is correlated with its actual computational process — is what makes chain-of-thought monitoring viable as a safety signal.
Implications
The honest reasoning claim. This paper takes a position that’s important for the monitoring-as-safety-solution thesis: the visible chain-of-thought is meaningful, not theater. If true, monitoring extended thinking provides genuine signal about model intent. If false — if models can learn to produce whatever reasoning is strategically optimal while thinking “underneath” — then chain-of-thought monitoring fails entirely.
Good news if true, concerning if false. The paper’s framing as “that’s good” depends on the control-inability being robust. As models become more capable, the question of whether they can learn to game visible reasoning becomes more pressing. A sufficiently capable model that learns that visible chain-of-thought is monitored has strong incentive to produce benign-looking reasoning regardless of underlying intent.
Thread: interpretability and monitoring. Sits alongside the chain-of-thought monitoring paper (March 2025), the GPT-5.4 thinking system card (March 2026), and the o3 system card as OpenAI’s published thinking on reasoning transparency.
Watch: Whether the “can’t control CoT” property holds for GPT-5.5 and future models, or whether capability improvements eventually allow more deliberate reasoning generation.