Measuring the performance of our models on real-world tasks
read at source ↗ openai.com
Measuring the performance of our models on real-world tasks
Source: OpenAI Date: 2025-09-25 URL: https://openai.com/index/gdpval
Summary
Title-only: OpenAI publishes “GDPVal” or a related real-world task performance measurement framework — the URL slug suggests GDP-linked value assessment. Published September 2025, this likely presents OpenAI’s methodology for evaluating model performance on tasks that correspond to real economic value generation, moving beyond academic benchmarks toward measuring AI’s actual productivity impact on knowledge work.
Implications
The real-world evaluation thread. Benchmark saturation is a known problem by mid-2025 — the models that top leaderboards don’t necessarily perform best on actual work tasks. OpenAI publishing a real-world task measurement framework is an attempt to own the narrative around what “model performance” means in practice. If GDPVal measures value creation on knowledge worker tasks, it implicitly frames AI model comparison in economic rather than academic terms — which benefits the models with the strongest enterprise adoption.
Economic measurement as competitive positioning. Publishing economic impact measurement alongside model launches creates a new evaluation surface where OpenAI’s enterprise penetration (more real-world task data) becomes a structural advantage. Competitors with less enterprise deployment have less real-world grounding for their performance claims. Watch whether this framework becomes adopted by third-party evaluators or remains an OpenAI-proprietary metric.