OpenAI Realtime API vs Whisper API

Comparing the features of OpenAI Realtime API to Whisper API

Feature
OpenAI Realtime API
Whisper API

Capability Features

Enterprise Privacy Commitment
Expanded Model Support Planned
Five New Voices
5
Function Calling
Human and Automated Safety Monitoring
Interruption Handling
Language/Country Support
100+ languages
Latest Whisper Model
Whisper Large V3
No Training on Data Without Permission
Non-Developer Access
Playground Access
Prompt Caching Planned
Public Beta
Reference Client Available
Response Format Selection
json
Scale for Millions
Six Preset Voices
6
Speaker Diarization
Speaker Labels
Speech-to-Speech
Streaming Audio Inputs/Outputs
Summary Generation
Supports Text and Audio Inputs
TextAudio
Translation Capability
Ultra Low Latency
WebSocket Connection

Integration Features

Agora Integration
API Integrations
Chat Completions API Integration
File Formats Supported
mp3videopodcastsmeetings
LiveKit Integration
OpenAI API Compatibility
OpenAI Node.js SDK Planned
OpenAI Python SDK Planned
Programming Language Agnostic
Supports GPT-4o
gpt-4o-realtime-preview
Twilio Voice API Integration

Limitation Features

AI Disclosure Requirement
Audio Only Modality (Initially)
Lower Session Limits Tiers 1-4
Lower than 100
No OpenAI Affiliation
No Simultaneous Session Limit Anymore
Simultaneous Sessions Limit Tier 5
100
Usage Policy Restriction

Pricing Features

Approximate Audio Input Price
$0.06/minute
Approximate Audio Output Price
$0.24/minute
Free Transcription Hours
30
Free Trial Package
Hourly Pricing
$0.17/hour
No Free Tier
Pricing Audio Input
$100/1M tokens
Pricing Audio Output
$200/1M tokens
Pricing Cached Audio Input
$20/1M tokens
Pricing Cached Text Input
$2.50/1M tokens
Pricing Text Input
$5/1M tokens
Pricing Text Output
$20/1M tokens
Trial Period
1 month