Emotion Tags
normalslowcryingsleepysighchuckle
Global Distribution Focus
Guided Emotion and Intonation
Human-in-the-Loop Labeling
Input Streaming for Lower Latency
Intro/Outro Music Support
LLM-based Customizability
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
Multilingual Content Translation
Number of Hosts per Episode
4
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Sample Finetuning Scripts
Sliding Window Detokenizer
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Supported Language List
EnglishSpanishFrenchHindiPortugueseChineseGermanJapaneseArabicRussianKoreanIndonesianItalianDutchTurkishPolishSwedishFilipinoMalayRomanianUkrainianGreekCzechDanishFinnishBulgarianCroatianSlovakTamil
Supported Source Quantity
70+
Supported Source Types
WebsitesBlogsPDFsPowerPointsOther FilesSpreadsheetsYouTube Videos
Training Data Volume
100k+ hours of speech, billions of text tokens
User Content Ownership
Users own the content they create