Cloud TTS vs Sesame

Comparing the features of Cloud TTS to Sesame

Feature
Cloud TTS
Sesame

Capability Features

Adjustable Speed
Adjustable Volume
Cloud-Based TTS
Consistent Personality
Context Awareness
Conversational Dynamics
Conversational Speech Generation
Dataset Size
1 million hours
Emotional Intelligence
Evaluation Suite
Export Audio Files
File Input Support
Karaoke-Style Highlighting
Language List
Norwegian Bokmål (Norway)American EnglishEuropean SpanishChinese (China)Russian (Russia)Arabic (Saudi Arabia)French (France)German (Germany)Afrikaans (South Africa)Amharic (Ethiopia)Arabic (United Arab Emirates)Arabic (Bahrain)Arabic (Algeria)Arabic (Egypt)Arabic (Iraq)Arabic (Jordan)Arabic (Kuwait)Arabic (Lebanon)Arabic (Libya)Arabic (Morocco)Arabic (Oman)Arabic (Qatar)Arabic (Syria)Arabic (Tunisia)Arabic (Yemen)Azerbaijani (Azerbaijan)Bulgarian (Bulgaria)Bangla (Bangladesh)Bangla (India)Bosnian (Bosnia & Herzegovina)Catalan (Spain)Czech (Czechia)Welsh (United Kingdom)Danish (Denmark)Austrian GermanSwiss High GermanGreek (Greece)Australian EnglishCanadian EnglishBritish EnglishEnglish (Hong Kong)English (Ireland)English (India)English (Kenya)English (Nigeria)English (New Zealand)English (Philippines)English (Singapore)English (Tanzania)English (South Africa)Spanish (Argentina)Spanish (Bolivia)Spanish (Chile)Spanish (Colombia)Spanish (Costa Rica)Spanish (Cuba)Spanish (Dominican Republic)Spanish (Ecuador)Spanish (Equatorial Guinea)Spanish (Guatemala)Spanish (Honduras)Mexican SpanishSpanish (Nicaragua)Spanish (Panama)Spanish (Peru)Spanish (Puerto Rico)Spanish (Paraguay)Spanish (El Salvador)Spanish (United States)Spanish (Uruguay)Spanish (Venezuela)Estonian (Estonia)Persian (Iran)Finnish (Finland)Filipino (Philippines)French (Belgium)Canadian FrenchSwiss FrenchIrish (Ireland)Galician (Spain)Gujarati (India)Hebrew (Israel)Hindi (India)Croatian (Croatia)Hungarian (Hungary)Indonesian (Indonesia)Icelandic (Iceland)Italian (Italy)Japanese (Japan)Javanese (Indonesia)Georgian (Georgia)Kazakh (Kazakhstan)Khmer (Cambodia)Kannada (India)Korean (South Korea)Lao (Laos)Lithuanian (Lithuania)Latvian (Latvia)Macedonian (North Macedonia)Malayalam (India)Mongolian (Mongolia)Marathi (India)Malay (Malaysia)Maltese (Malta)Burmese (Myanmar [Burma])Nepali (Nepal)FlemishDutch (Netherlands)Polish (Poland)Pashto (Afghanistan)Brazilian PortugueseEuropean PortugueseRomanian (Romania)Sinhala (Sri Lanka)Slovak (Slovakia)Slovenian (Slovenia)Somali (Somalia)Albanian (Albania)Serbian (Serbia)Sundanese (Indonesia)Swedish (Sweden)Swahili (Kenya)Swahili (Tanzania)Tamil (India)Tamil (Sri Lanka)Tamil (Malaysia)Tamil (Singapore)Telugu (India)Thai (Thailand)Turkish (Türkiye)Ukrainian (Ukraine)Urdu (India)Urdu (Pakistan)Uzbek (Uzbekistan)Vietnamese (Vietnam)Chinese (China, LIAONING)Chinese (China, SHAANXI)Chinese (Hong Kong)Chinese (Taiwan)Zulu (South Africa)
Mobile Friendly
Model Sizes
Tiny: 1B backbone, 100M decoderSmall: 3B backbone, 250M decoderMedium: 8B backbone, 300M decoder
Multiple Speaker Handling
Objective Metrics
Word Error RateSpeaker SimilarityHomograph DisambiguationPronunciation Consistency
Partial Multilingual Support Planned
Planned for 20+ languages
Pronunciation Correction
Sequence Length
2048
Single-Stage Model
Subjective Metrics
Comparative Mean Opinion Score
Supported Language List
140
Text and Audio Input
TextAudio
Text Input
Training Epochs
5
User Preferences Persistence
User-Friendly Interface
Voice Selection
Microsoft AvaMultilingual Online (Natural)Microsoft AndrewMultilingual Online (Natural)Microsoft EmmaMultilingual Online (Natural)Microsoft BrianMultilingual Online (Natural)Microsoft Ava Online (Natural)Microsoft Andrew Online (Natural)Microsoft Emma Online (Natural)Microsoft Brian Online (Natural)Microsoft Ana Online (Natural)Microsoft Aria Online (Natural)Microsoft Christopher Online (Natural)Microsoft Eric Online (Natural)Microsoft Guy Online (Natural)Microsoft Jenny Online (Natural)Microsoft Michelle Online (Natural)Microsoft Roger Online (Natural)Microsoft Steffan Online (Natural)

Integration Features

API Availability
GitHub Release
LLama Architecture Backbone
Mimi Split-RVQ Tokenizer
Third-Party API Integration

Limitation Features

Ads
Cannot Model Conversation Structure
English Language Dominance
Memory Bottleneck in Training
No Pre-trained Language Model Use
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly
Usage Limits
Not specified

Pricing Features

Free Preview
Free Tier
Open Source
Apache 2.0
Pricing Plan Details
None
Trial Period