Controllability and Security
Emotion Tags
normalslowcryingsleepysighchuckle
Full Model Training Hours
100000
Guided Emotion and Intonation
High-Fidelity Speech Synthesis
Input Streaming for Lower Latency
LLM-based Customizability
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
Open Source Model Training Hours
40000
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Sample Finetuning Scripts
Sample Rate for Audio Output
24000
Sliding Window Detokenizer
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Training Data Volume
100k+ hours of speech, billions of text tokens
Voice Customization Options