Introducing SimpleQA
read at source ↗ openai.com
Introducing SimpleQA
Source: OpenAI Date: 2024-10-30 URL: https://openai.com/index/introducing-simpleqa
Summary
Summary
OpenAI introduced SimpleQA in October 2024 — a factuality benchmark focused on simple, unambiguous factual questions where there is a clear correct answer. SimpleQA was designed to measure model factual accuracy on well-defined questions, providing a cleaner signal about hallucination rates than complex open-ended benchmarks.
Implications
Research/evaluation thread. SimpleQA addresses a genuine benchmark design problem: many factuality evaluations involve questions where the correct answer is ambiguous, context-dependent, or requires recent information models don’t have — making it hard to isolate hallucination rates from other failure modes. A simple, unambiguous factual benchmark provides a clean signal. OpenAI publishing SimpleQA also reflects the growing meta-level conversation about what benchmarks actually measure and whether model capability improvements on benchmarks translate to factual reliability improvements in practice. The answer for current models is mixed: SimpleQA scores improve with model scale, but hallucination in open-ended generation remains significant even for high-scoring models.