Emotion Tags
normalslowcryingsleepysighchuckle
Guided Emotion and Intonation
Input Streaming for Lower Latency
Karaoke-Style Highlighting
Language List
Norwegian Bokmål (Norway)American EnglishEuropean SpanishChinese (China)Russian (Russia)Arabic (Saudi Arabia)French (France)German (Germany)Afrikaans (South Africa)Amharic (Ethiopia)Arabic (United Arab Emirates)Arabic (Bahrain)Arabic (Algeria)Arabic (Egypt)Arabic (Iraq)Arabic (Jordan)Arabic (Kuwait)Arabic (Lebanon)Arabic (Libya)Arabic (Morocco)Arabic (Oman)Arabic (Qatar)Arabic (Syria)Arabic (Tunisia)Arabic (Yemen)Azerbaijani (Azerbaijan)Bulgarian (Bulgaria)Bangla (Bangladesh)Bangla (India)Bosnian (Bosnia & Herzegovina)Catalan (Spain)Czech (Czechia)Welsh (United Kingdom)Danish (Denmark)Austrian GermanSwiss High GermanGreek (Greece)Australian EnglishCanadian EnglishBritish EnglishEnglish (Hong Kong)English (Ireland)English (India)English (Kenya)English (Nigeria)English (New Zealand)English (Philippines)English (Singapore)English (Tanzania)English (South Africa)Spanish (Argentina)Spanish (Bolivia)Spanish (Chile)Spanish (Colombia)Spanish (Costa Rica)Spanish (Cuba)Spanish (Dominican Republic)Spanish (Ecuador)Spanish (Equatorial Guinea)Spanish (Guatemala)Spanish (Honduras)Mexican SpanishSpanish (Nicaragua)Spanish (Panama)Spanish (Peru)Spanish (Puerto Rico)Spanish (Paraguay)Spanish (El Salvador)Spanish (United States)Spanish (Uruguay)Spanish (Venezuela)Estonian (Estonia)Persian (Iran)Finnish (Finland)Filipino (Philippines)French (Belgium)Canadian FrenchSwiss FrenchIrish (Ireland)Galician (Spain)Gujarati (India)Hebrew (Israel)Hindi (India)Croatian (Croatia)Hungarian (Hungary)Indonesian (Indonesia)Icelandic (Iceland)Italian (Italy)Japanese (Japan)Javanese (Indonesia)Georgian (Georgia)Kazakh (Kazakhstan)Khmer (Cambodia)Kannada (India)Korean (South Korea)Lao (Laos)Lithuanian (Lithuania)Latvian (Latvia)Macedonian (North Macedonia)Malayalam (India)Mongolian (Mongolia)Marathi (India)Malay (Malaysia)Maltese (Malta)Burmese (Myanmar [Burma])Nepali (Nepal)FlemishDutch (Netherlands)Polish (Poland)Pashto (Afghanistan)Brazilian PortugueseEuropean PortugueseRomanian (Romania)Sinhala (Sri Lanka)Slovak (Slovakia)Slovenian (Slovenia)Somali (Somalia)Albanian (Albania)Serbian (Serbia)Sundanese (Indonesia)Swedish (Sweden)Swahili (Kenya)Swahili (Tanzania)Tamil (India)Tamil (Sri Lanka)Tamil (Malaysia)Tamil (Singapore)Telugu (India)Thai (Thailand)Turkish (Türkiye)Ukrainian (Ukraine)Urdu (India)Urdu (Pakistan)Uzbek (Uzbekistan)Vietnamese (Vietnam)Chinese (China, LIAONING)Chinese (China, SHAANXI)Chinese (Hong Kong)Chinese (Taiwan)Zulu (South Africa)
LLM-based Customizability
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Sample Finetuning Scripts
Sliding Window Detokenizer
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Supported Language List
140
Training Data Volume
100k+ hours of speech, billions of text tokens
User Preferences Persistence
Voice Selection
Microsoft AvaMultilingual Online (Natural)Microsoft AndrewMultilingual Online (Natural)Microsoft EmmaMultilingual Online (Natural)Microsoft BrianMultilingual Online (Natural)Microsoft Ava Online (Natural)Microsoft Andrew Online (Natural)Microsoft Emma Online (Natural)Microsoft Brian Online (Natural)Microsoft Ana Online (Natural)Microsoft Aria Online (Natural)Microsoft Christopher Online (Natural)Microsoft Eric Online (Natural)Microsoft Guy Online (Natural)Microsoft Jenny Online (Natural)Microsoft Michelle Online (Natural)Microsoft Roger Online (Natural)Microsoft Steffan Online (Natural)