Sesame vs Voiser AI Transcription & Text‑to‑Speech

Comparing the features of Sesame to Voiser AI Transcription & Text‑to‑Speech

Feature
Sesame
Voiser AI Transcription & Text‑to‑Speech

Capability Features

Accuracy Rate
Up to 100%
Advanced Editor
Automatic Punctuation
ChatGPT Summary Creation
Consistent Personality
Context Awareness
Conversational Dynamics
Conversational Speech Generation
Data Protection
Dataset Size
1 million hours
Emotional Intelligence
Evaluation Suite
Flexible Download Options
WordExcelTextSubtitle
Grouping Transcriptions
Language List
Arabic (Algeria)Arabic (Bahrain)Arabic (Egypt)Arabic (Iraq)Arabic (Israel)Arabic (Jordan)Arabic (Kuwait)Arabic (Lebanon)Arabic (Libya)Arabic (Morocco)Arabic (Oman)Arabic (Qatar)Arabic (Saudi Arabia)Arabic (Palestinian)Arabic (Syria)Arabic (Tunisia)Arabic (United Arab Emirates)Arabic (Yemen)Bulgarian (Bulgaria)Catalan (Spain)Chinese (Cantonese, Traditional)Chinese (Mandarin, Simplified)Chinese (Taiwanese Mandarin)Croatian (Croatia)Czech (Czech Republic)Danish (Denmark)Dutch (Netherlands)English (Australia)English (Canada)English (Ghana)English (Hong Kong)English (India)English (Ireland)English (Kenya)English (New Zealand)English (Nigeria)English (Philippines)English (Singapore)English (South Africa)English (Tanzania)English (United Kingdom)English (United States)Estonian(Estonia)Filipino (Philippines)Finnish (Finland)French (Canada)French (France)French (Switzerland)German (Austria)German (Germany)Greek (Greece)Gujarati (Indian)Hebrew (Israel)Hindi (India)Hungarian (Hungary)Indonesian (Indonesia)Irish(Ireland)Italian (Italy)Japanese (Japan)Korean (Korea)Latvian (Latvia)Lithuanian (Lithuania)Malay (Malaysia)Maltese (Malta)Marathi (India)Norwegian (Bokmål, Norway)Polish (Poland)Portuguese (Brazil)Portuguese (Portugal)Romanian (Romania)Russian (Russia)Slovak (Slovakia)Slovenian (Slovenia)Spanish (Argentina)Spanish (Bolivia)Spanish (Chile)Spanish (Colombia)Spanish (Costa Rica)Spanish (Cuba)Spanish (Dominican Republic)Spanish (Ecuador)Spanish (El Salvador)Spanish (Equatorial Guinea)Spanish (Guatemala)Spanish (Honduras)Spanish (Mexico)Spanish (Nicaragua)Spanish (Panama)Spanish (Paraguay)Spanish (Peru)Spanish (Puerto Rico)Spanish (Spain)Spanish (Uruguay)Spanish (USA)Spanish (Venezuela)Swedish (Sweden)Tamil (India)Telugu (India)Thai (Thailand)Turkish (Turkey)Vietnamese (Vietnam)Afrikaans (South Africa)Albanian (Albania)Amharic (Ethiopia)Armenian (Armenia)Azerbaijani (Azerbaijan)Basque (Spain)Bengali (India)Burmese (Myanmar)Czech (Czech)Dutch (Belgium)French (Belgium)Galician (Spain)Georgian (Georgia)German (Switzerland)Icelandic (Iceland)Irish (Ireland)Italian (Switzerland)Javanese (Indonesia)Kannada (India)Kazakh (Kazakhstan)Khmer (Cambodia)Lao (Laos)Macedonian (North Macedonia)Mongolian (Mongolia)Nepali (Nepal)Persian (Iran)Serbian (Serbia)Sinhala (Sri Lanka)Swahili (Kenya)Swahili (Tanzania)Ukrainian (Ukraine)Uzbek (Uzbekistan)Zulu (South Africa)
Model Sizes
Tiny: 1B backbone, 100M decoderSmall: 3B backbone, 250M decoderMedium: 8B backbone, 300M decoder
Multiple Speaker Handling
Objective Metrics
Word Error RateSpeaker SimilarityHomograph DisambiguationPronunciation Consistency
Partial Multilingual Support Planned
Planned for 20+ languages
Profanity Filtering
Pronunciation Correction
Sequence Length
2048
Single-Stage Model
Speaker Identification
Subjective Metrics
Comparative Mean Opinion Score
Subtitle Customization
Subtitle Export
Supported Language List
75
Supported Utilization Areas
Call CentersJournalistsHealthcareLawyersMedia and BroadcastingPodcastsGovernmentResearchersInterviewsStudentsMeetingsSubtitle
Supported Voices
550
Text and Audio Input
TextAudio
Text to Speech
Timestamps
Training Epochs
5
Translation Support
129
User-Friendly Controls
YouTube Link Transcription

Integration Features

Downloadable File Formats
TxtDocxXlsxSrt
File Formats Supported
MP3WAVM4AMOVMP4
GitHub Release
LLama Architecture Backbone
Mimi Split-RVQ Tokenizer

Limitation Features

Cannot Model Conversation Structure
English Language Dominance
Memory Bottleneck in Training
No Explicit Trial Period
No Mention of API
No Mention of Max File Size
No Pre-trained Language Model Use
No Pricing Details Listed
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly

Pricing Features

Free Preview
Free Tier
Open Source
Apache 2.0