WhisperUI vs Sesame

Comparing the features of WhisperUI to Sesame

Feature
WhisperUI
Sesame

Capability Features

Batch File Upload
Consistent Personality
Context Awareness
Conversational Dynamics
Conversational Speech Generation
Dataset Size
1 million hours
Desktop Version
Drag & Drop Upload
Edit Transcription
Emotional Intelligence
Evaluation Suite
Export SRT Subtitles
Fast Transcription Speed
Most files within a few minutes
File Browse Upload
High Accuracy Model
High (depends on audio quality)
Model Sizes
Tiny: 1B backbone, 100M decoderSmall: 3B backbone, 250M decoderMedium: 8B backbone, 300M decoder
Multi-Language Transcription
Multiple Speaker Handling
Objective Metrics
Word Error RateSpeaker SimilarityHomograph DisambiguationPronunciation Consistency
Partial Multilingual Support Planned
Planned for 20+ languages
Pronunciation Correction
Sequence Length
2048
Single-Stage Model
Speech to Text
Subjective Metrics
Comparative Mean Opinion Score
Supported Language List
EnglishSpanishFrenchGermanChineseand more
Text and Audio Input
TextAudio
Text to Speech
Training Epochs
5
Translation to English
Unlimited Daily Uploads

Integration Features

GitHub Release
LLama Architecture Backbone
Mimi Split-RVQ Tokenizer
Supported Audio Types
mp3mp4mpegmpgam4awavoggwebm

Limitation Features

Cannot Model Conversation Structure
English Language Dominance
File Size Limit
25
Memory Bottleneck in Training
No Internal Billing
No Pre-trained Language Model Use
OpenAI API Key Required
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly
Web App Only

Other Features

API Key Stored Locally

Pricing Features

Direct API Usage Billing
OpenAI API usage billing
Free Preview
Has Free Tier
Open Source
Apache 2.0
Premium Plan Features
Upload multiple files at onceUnlimited daily files uploadTransform audio files into SRT files