Emotion Tags
normalslowcryingsleepysighchuckle
Enterprise Privacy Commitment
Expanded Model Support Planned
Guided Emotion and Intonation
Human and Automated Safety Monitoring
Input Streaming for Lower Latency
LLM-based Customizability
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
No Training on Data Without Permission
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Reference Client Available
Sample Finetuning Scripts
Sliding Window Detokenizer
Streaming Audio Inputs/Outputs
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Supports Text and Audio Inputs
Training Data Volume
100k+ hours of speech, billions of text tokens