Voice Dictation vs Sesame

Comparing the features of Voice Dictation to Sesame

Feature
Voice Dictation
Sesame

Capability Features

Consistent Personality
Context Awareness
Conversational Dynamics
Conversational Speech Generation
Dataset Size
1 million hours
Emotional Intelligence
Evaluation Suite
Local Storage Only
Model Sizes
Tiny: 1B backbone, 100M decoderSmall: 3B backbone, 250M decoderMedium: 8B backbone, 300M decoder
Multiple Speaker Handling
Objective Metrics
Word Error RateSpeaker SimilarityHomograph DisambiguationPronunciation Consistency
Partial Multilingual Support Planned
Planned for 20+ languages
Pronunciation Correction
Real-time Transcription
Sequence Length
2048
Single-Stage Model
Speech to Text
Subjective Metrics
Comparative Mean Opinion Score
Supported Language List
AfrikaansBahasa IndonesiaBahasa MelayuCatalàČeštinaDanskDeutschEnglishEspañolEuskaraFilipinoFrançaisGalegohrvatskiIsizuluÍslenskaItalianoLietuviųMagyarNederlandsNorsk (Bokmål)PolskiPortuguêsRomânăSlovenčinaSlovenščinaSuomiSvenskaTiếng ViệtTürkçeΕλληνικάБългарскиРусскийСрпскиУкраїнськаעבריתالعربيةفارسیहिन्दीاُردُوአማርኛAzərbaycancaবাংলাગુજરાતીಕನ್ನಡភាសាខ្មែរLatviešuമലയാളംमराठीລາວनेपाली भाषाසිංහලBasa SundaతెలుగుKiswahiliქართულიՀայերենதமிழ்ไทยசிங்கப்பூர்中文(中国)中文(台灣)中文(香港)日本語한국어
Text and Audio Input
TextAudio
Training Epochs
5
Voice Commands

Integration Features

GitHub Release
Google Speech Recognition
LLama Architecture Backbone
Mimi Split-RVQ Tokenizer
Supported Platforms
Google ChromeWindowsMacLinux

Limitation Features

Browser Compatibility
Google Chrome only
Cannot Model Conversation Structure
English Language Dominance
Memory Bottleneck in Training
No API Access
No Export Formats Listed
No Mobile App
No Pre-trained Language Model Use
No Team Collaboration
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly
Requires Internet Connection

Pricing Features

Free Preview
Free Tier
Open Source
Apache 2.0