Introducing next-generation audio models in the API
read at source ↗ openai.com
Introducing next-generation audio models in the API
Source: OpenAI Date: 2025-03-20 URL: https://openai.com/index/introducing-our-next-generation-audio-models
Summary
Summary
OpenAI introduced next-generation audio models in the API in March 2025 — including improved speech-to-text (Whisper successors), text-to-speech models with more natural prosody, and the audio components of the Realtime API for low-latency speech applications. The models offered significant quality improvements over the previous generation across accent recognition, background noise handling, and natural-sounding speech synthesis.
Implications
Platform/multimodal thread. Audio model improvements are primarily a developer infrastructure story: the Realtime API with better audio models makes voice-first application development more viable for production use cases. The quality improvements matter most for accent diversity and noisy environments — the scenarios where prior Whisper performance degraded significantly. In the context of Operator and other agent products, better audio models also feed into voice-controlled agent interfaces. Competitive pressure from ElevenLabs (TTS), AssemblyAI (STT), and Deepgram drove OpenAI to maintain audio model parity as enterprise voice application developers evaluated alternatives.