Sesame vs Voiser

Comparing the features of Sesame to Voiser

Feature
Sesame
Voiser

Capability Features

AR/VR Support
Automatic Punctuation
Available Voices
550
Batch Processing
Consistent Personality
Context Awareness
Conversational Dynamics
Conversational Speech Generation
Country Coverage
200
Dataset Size
1 million hours
Dialects Supported
135
Emotion Voice Options
Emotional Intelligence
Evaluation Suite
Export Formats
WordExcelTxtSrt
Languages Supported
75
Minimum Accuracy Claim
%99.9 success rate
Model Sizes
Tiny: 1B backbone, 100M decoderSmall: 3B backbone, 250M decoderMedium: 8B backbone, 300M decoder
Multiple Speaker Handling
Objective Metrics
Word Error RateSpeaker SimilarityHomograph DisambiguationPronunciation Consistency
Online Dictation
Partial Multilingual Support Planned
Planned for 20+ languages
Profanity Filtering
Pronunciation Correction
Sequence Length
2048
Single-Stage Model
Smart Guide
Speaker Identification
Speech-to-Text
Subjective Metrics
Comparative Mean Opinion Score
Subtitle Customization
Talking Avatar
Text and Audio Input
TextAudio
Text to Speech
Text-to-Video
Training Epochs
5
Voice Cloning
Voice Quality Levels
HDHQUHD
YouTube Dubbing

Integration Features

API Access
ChatGPT Integration
Email Login
Facebook Login
File Format Support (Audio)
.mp3.wav.flac.aac.wma.ogg.aiff
File Format Support (Video)
.avi.mp4.mov.webm.mpeg.3gp
GitHub Release
Google Login
LLama Architecture Backbone
Mimi Split-RVQ Tokenizer
URL Import Support
Wordpress Integration
YouTube Import

Limitation Features

Cannot Model Conversation Structure
English Language Dominance
Maximum Usage Without Payment
50 characters for TTS, 5 minutes for STT
Memory Bottleneck in Training
No Pre-trained Language Model Use
Premium Voices (Enterprise)
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly
Studio Free Limit
50
Transcription Limit Free Tier
5

Pricing Features

Free Preview
Free Tier
Open Source
Apache 2.0
Quota Extension Via Purchase