Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Guided Emotion and Intonation
Input Streaming for Lower Latency
Open Source Release Planned
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Sample Finetuning Scripts
Emotion Tags
normalslowcryingsleepysighchuckle
Training Data Volume
100k+ hours of speech, billions of text tokens
LLM-based Customizability
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
Sliding Window Detokenizer