2025-01-22 · OpenAI

Trading inference-time compute for adversarial robustness

modelsresearchinfrastructure

Trading inference-time compute for adversarial robustness

Source: OpenAI Date: 2025-01-22 URL: https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness

Summary

OpenAI published research showing that inference-time compute scaling — the technique underlying o-series reasoning models — can improve adversarial robustness, not just benchmark performance on benign tasks. By spending more compute at inference time on verification and self-checking, models become harder to jailbreak and more resistant to adversarial prompting.

Implications

Safety/alignment thread. The adversarial robustness finding is significant because it suggests test-time compute scaling has safety benefits beyond capability improvements. If more inference compute makes models harder to misuse, then the capability and safety interests align on the same architectural direction — a welcome convergence that isn’t always the case. The January 2025 timing places this as OpenAI was scaling up the o-series line toward o3, where the safety evaluations were more intensive. The research also partially addresses a concern about reasoning models: that extended thinking might find more sophisticated ways to comply with harmful requests. The adversarial robustness evidence suggests the opposite may be true, though with caveats about what specific adversarial scenarios were tested.

← all signals