OpenAI Text To Speech vs OpenAI Realtime API

Comparing the features of OpenAI Text To Speech to OpenAI Realtime API

Feature
OpenAI Text To Speech
OpenAI Realtime API

Capability Features

Age Selection
Audio Settings
Audio Speed Setting Range
1
Bookmark Page
Country Selection
Create Speech Button
Custom Voice Selection
AlloyEchoFableOnyxNovaShimmerAshCoralSage
Enterprise Privacy Commitment
Expanded Model Support Planned
Favorite Voice Option
Five New Voices
5
Function Calling
Gender Selection
High Quality Voices
Human and Automated Safety Monitoring
Integrated Audio Player
Interruption Handling
No Training on Data Without Permission
Playground Access
Prompt Caching Planned
Public Beta
Reference Client Available
Reset Filters
Sample Playback
Six Preset Voices
6
Speech-to-Speech
Streaming Audio Inputs/Outputs
Supports Text and Audio Inputs
TextAudio
Text to Speech
Ultra Low Latency
Voice Characteristics
NeutralProfessionalClearWarmFriendlyEngagingEnergeticExpressiveMatureExperiencedYoungOldFemaleMaleLivelyVibrantDynamicCheerfulCommunity-orientedWiseCalmKnowledgeable
WebSocket Connection

Integration Features

Agora Integration
API Integrations
Chat Completions API Integration
LiveKit Integration
OpenAI Node.js SDK Planned
OpenAI Python SDK Planned
Supports GPT-4o
gpt-4o-realtime-preview
Twilio Voice API Integration

Limitation Features

AI Disclosure Requirement
Audio Only Modality (Initially)
Lower Session Limits Tiers 1-4
Lower than 100
No Simultaneous Session Limit Anymore
Pricing Information
Processing Delay
Simultaneous Sessions Limit Tier 5
100
Usage Policy Restriction
Video Tutorial Requirement
Watch 30 seconds

Pricing Features

Approximate Audio Input Price
$0.06/minute
Approximate Audio Output Price
$0.24/minute
No Free Tier
Pricing Audio Input
$100/1M tokens
Pricing Audio Output
$200/1M tokens
Pricing Cached Audio Input
$20/1M tokens
Pricing Cached Text Input
$2.50/1M tokens
Pricing Text Input
$5/1M tokens
Pricing Text Output
$20/1M tokens