Trading inference-time compute for adversarial robustness
read at source ↗ openai.com
Trading inference-time compute for adversarial robustness
Source: OpenAI Date: 2025-01-22 URL: https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness
Summary
Summary
OpenAI published research showing that inference-time compute scaling — the technique underlying o-series reasoning models — can improve adversarial robustness, not just benchmark performance on benign tasks. By spending more compute at inference time on verification and self-checking, models become harder to jailbreak and more resistant to adversarial prompting.
Implications
Safety/alignment thread. The adversarial robustness finding is significant because it suggests test-time compute scaling has safety benefits beyond capability improvements. If more inference compute makes models harder to misuse, then the capability and safety interests align on the same architectural direction — a welcome convergence that isn’t always the case. The January 2025 timing places this as OpenAI was scaling up the o-series line toward o3, where the safety evaluations were more intensive. The research also partially addresses a concern about reasoning models: that extended thinking might find more sophisticated ways to comply with harmful requests. The adversarial robustness evidence suggests the opposite may be true, though with caveats about what specific adversarial scenarios were tested.