Sesame vs OpenAI Realtime API

Comparing the features of Sesame to OpenAI Realtime API

Feature
Sesame
OpenAI Realtime API

Capability Features

Consistent Personality
Context Awareness
Conversational Dynamics
Conversational Speech Generation
Dataset Size
1 million hours
Emotional Intelligence
Enterprise Privacy Commitment
Evaluation Suite
Expanded Model Support Planned
Five New Voices
5
Function Calling
Human and Automated Safety Monitoring
Interruption Handling
Model Sizes
Tiny: 1B backbone, 100M decoderSmall: 3B backbone, 250M decoderMedium: 8B backbone, 300M decoder
Multiple Speaker Handling
No Training on Data Without Permission
Objective Metrics
Word Error RateSpeaker SimilarityHomograph DisambiguationPronunciation Consistency
Partial Multilingual Support Planned
Planned for 20+ languages
Playground Access
Prompt Caching Planned
Pronunciation Correction
Public Beta
Reference Client Available
Sequence Length
2048
Single-Stage Model
Six Preset Voices
6
Speech-to-Speech
Streaming Audio Inputs/Outputs
Subjective Metrics
Comparative Mean Opinion Score
Supports Text and Audio Inputs
TextAudio
Text and Audio Input
TextAudio
Training Epochs
5
Ultra Low Latency
WebSocket Connection

Integration Features

Agora Integration
Chat Completions API Integration
GitHub Release
LiveKit Integration
LLama Architecture Backbone
Mimi Split-RVQ Tokenizer
OpenAI Node.js SDK Planned
OpenAI Python SDK Planned
Supports GPT-4o
gpt-4o-realtime-preview
Twilio Voice API Integration

Limitation Features

AI Disclosure Requirement
Audio Only Modality (Initially)
Cannot Model Conversation Structure
English Language Dominance
Lower Session Limits Tiers 1-4
Lower than 100
Memory Bottleneck in Training
No Pre-trained Language Model Use
No Simultaneous Session Limit Anymore
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly
Simultaneous Sessions Limit Tier 5
100
Usage Policy Restriction

Pricing Features

Approximate Audio Input Price
$0.06/minute
Approximate Audio Output Price
$0.24/minute
Free Preview
No Free Tier
Open Source
Apache 2.0
Pricing Audio Input
$100/1M tokens
Pricing Audio Output
$200/1M tokens
Pricing Cached Audio Input
$20/1M tokens
Pricing Cached Text Input
$2.50/1M tokens
Pricing Text Input
$5/1M tokens
Pricing Text Output
$20/1M tokens