AI Voice Cloning vs Sesame

Comparing the features of AI Voice Cloning to Sesame

Feature
AI Voice Cloning
Sesame

Capability Features

Audio Download
Audio Input Methods
Record AudioUpload Audio
Consistent Personality
Context Awareness
Conversational Dynamics
Conversational Speech Generation
Dataset Size
1 million hours
Emotional Intelligence
Evaluation Suite
Future Style Controls
Minimum Clone Audio Length
3
Model Sizes
Tiny: 1B backbone, 100M decoderSmall: 3B backbone, 250M decoderMedium: 8B backbone, 300M decoder
Multiple Speaker Handling
Objective Metrics
Word Error RateSpeaker SimilarityHomograph DisambiguationPronunciation Consistency
Partial Multilingual Support Planned
Planned for 20+ languages
Planned Language Expansion
Privacy and Security
Pronunciation Correction
Recommended Clone Audio Length
3-10 seconds
Sequence Length
2048
Single-Stage Model
Subjective Metrics
Comparative Mean Opinion Score
Support Contact Email
support@aivoicecloning.io
Supported Language List
EnglishMandarinJapaneseKorean
Text and Audio Input
TextAudio
Training Epochs
5
User-Friendly Interface
Web Platform

Integration Features

GitHub Release
LLama Architecture Backbone
Mimi Split-RVQ Tokenizer
Supported Audio Types
MP3WAV

Limitation Features

Cannot Model Conversation Structure
Commercial Use Restrictions
Consent Required for Voice Cloning
English Language Dominance
Free Tier Generation Speed
slower
Memory Bottleneck in Training
No API Currently
No Pre-trained Language Model Use
Personal Use Only on Free
Prohibited Use Cases
ImpersonationFraudHate SpeechSpam
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly
Single Speaker Input
Voice Customization

Pricing Features

Commercial Use Premium
Free Preview
Free Tier
Free Tier Usage Limit
1200
Open Source
Apache 2.0
Premium Unlimited Generation
Trial Period