Voice Dictation vs OpenAI Realtime API

Comparing the features of Voice Dictation to OpenAI Realtime API

Feature
Voice Dictation
OpenAI Realtime API

Capability Features

Enterprise Privacy Commitment
Expanded Model Support Planned
Five New Voices
5
Function Calling
Human and Automated Safety Monitoring
Interruption Handling
Local Storage Only
No Training on Data Without Permission
Playground Access
Prompt Caching Planned
Public Beta
Real-time Transcription
Reference Client Available
Six Preset Voices
6
Speech to Text
Speech-to-Speech
Streaming Audio Inputs/Outputs
Supported Language List
AfrikaansBahasa IndonesiaBahasa MelayuCatalàČeštinaDanskDeutschEnglishEspañolEuskaraFilipinoFrançaisGalegohrvatskiIsizuluÍslenskaItalianoLietuviųMagyarNederlandsNorsk (Bokmål)PolskiPortuguêsRomânăSlovenčinaSlovenščinaSuomiSvenskaTiếng ViệtTürkçeΕλληνικάБългарскиРусскийСрпскиУкраїнськаעבריתالعربيةفارسیहिन्दीاُردُوአማርኛAzərbaycancaবাংলাગુજરાતીಕನ್ನಡភាសាខ្មែរLatviešuമലയാളംमराठीລາວनेपाली भाषाසිංහලBasa SundaతెలుగుKiswahiliქართულიՀայերենதமிழ்ไทยசிங்கப்பூர்中文(中国)中文(台灣)中文(香港)日本語한국어
Supports Text and Audio Inputs
TextAudio
Ultra Low Latency
Voice Commands
WebSocket Connection

Integration Features

Agora Integration
Chat Completions API Integration
Google Speech Recognition
LiveKit Integration
OpenAI Node.js SDK Planned
OpenAI Python SDK Planned
Supported Platforms
Google ChromeWindowsMacLinux
Supports GPT-4o
gpt-4o-realtime-preview
Twilio Voice API Integration

Limitation Features

AI Disclosure Requirement
Audio Only Modality (Initially)
Browser Compatibility
Google Chrome only
Lower Session Limits Tiers 1-4
Lower than 100
No API Access
No Export Formats Listed
No Mobile App
No Simultaneous Session Limit Anymore
No Team Collaboration
Requires Internet Connection
Simultaneous Sessions Limit Tier 5
100
Usage Policy Restriction

Pricing Features

Approximate Audio Input Price
$0.06/minute
Approximate Audio Output Price
$0.24/minute
Free Tier
No Free Tier
Pricing Audio Input
$100/1M tokens
Pricing Audio Output
$200/1M tokens
Pricing Cached Audio Input
$20/1M tokens
Pricing Cached Text Input
$2.50/1M tokens
Pricing Text Input
$5/1M tokens
Pricing Text Output
$20/1M tokens