2025-03-20 · OpenAI

Introducing next-generation audio models in the API

infrastructure

Introducing next-generation audio models in the API

Source: OpenAI Date: 2025-03-20 URL: https://openai.com/index/introducing-our-next-generation-audio-models

Summary

OpenAI introduced next-generation audio models in the API in March 2025 — including improved speech-to-text (Whisper successors), text-to-speech models with more natural prosody, and the audio components of the Realtime API for low-latency speech applications. The models offered significant quality improvements over the previous generation across accent recognition, background noise handling, and natural-sounding speech synthesis.

Implications

Platform/multimodal thread. Audio model improvements are primarily a developer infrastructure story: the Realtime API with better audio models makes voice-first application development more viable for production use cases. The quality improvements matter most for accent diversity and noisy environments — the scenarios where prior Whisper performance degraded significantly. In the context of Operator and other agent products, better audio models also feed into voice-controlled agent interfaces. Competitive pressure from ElevenLabs (TTS), AssemblyAI (STT), and Deepgram drove OpenAI to maintain audio model parity as enterprise voice application developers evaluated alternatives.

← all signals