2024-10-30 · OpenAI

Introducing SimpleQA

models

read at source ↗ openai.com

Introducing SimpleQA

Source: OpenAI Date: 2024-10-30 URL: https://openai.com/index/introducing-simpleqa

Summary

Summary

OpenAI introduced SimpleQA in October 2024 — a factuality benchmark focused on simple, unambiguous factual questions where there is a clear correct answer. SimpleQA was designed to measure model factual accuracy on well-defined questions, providing a cleaner signal about hallucination rates than complex open-ended benchmarks.

Implications

Research/evaluation thread. SimpleQA addresses a genuine benchmark design problem: many factuality evaluations involve questions where the correct answer is ambiguous, context-dependent, or requires recent information models don’t have — making it hard to isolate hallucination rates from other failure modes. A simple, unambiguous factual benchmark provides a clean signal. OpenAI publishing SimpleQA also reflects the growing meta-level conversation about what benchmarks actually measure and whether model capability improvements on benchmarks translate to factual reliability improvements in practice. The answer for current models is mixed: SimpleQA scores improve with model scale, but hallucination in open-ended generation remains significant even for high-scoring models.

← all signals