2026-05-07 · OpenAI

OpenAI Voice Intelligence: Three New API Models

pricingmodels

OpenAI Voice Intelligence: Three New API Models

Source: OpenAI Blog Date: 2026-05-07 URL: https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/

Summary

OpenAI introduced three audio models in the API: GPT-Realtime-2 (GPT-5-class reasoning, 96.6% BigBench Audio vs 81.4% for v1.5), GPT-Realtime-Translate (live translation from 70+ input to 13 output languages), and GPT-Realtime-Whisper (streaming live speech-to-text). Pricing: Realtime-2 at $32/$64 per 1M audio tokens, Translate at $0.034/min, Whisper at $0.017/min.

Implications

Agent modality thread: voice interaction is now available in two of six major coding agent ecosystems (Gemini CLI v0.41.0 voice mode + OpenAI voice API). The modality boundary between “chat with an agent” and “talk to an agent” is dissolving. Developer tooling becomes multimodal.
Enterprise deployment: GPT-5-class reasoning in voice means voice agents can handle complex multi-step requests, not just command parsing. Customer service, internal help desk, and live translation use cases become viable at API level.
Pricing structure: voice token pricing ($32/1M input) is 6.4x text pricing ($5/1M). The premium reflects compute intensity but creates a meaningful cost barrier for high-volume voice agent deployments.

← all signals