Emotion Tags
normalslowcryingsleepysighchuckle
Guided Emotion and Intonation
In-Person Conversation Translation
Input Streaming for Lower Latency
Language and Accent Variants
LLM-based Customizability
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
No App Needed for Invitees
No Need for Human Interpreter
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Real-Time Translation Speed
Slight delay
Sample Finetuning Scripts
Sliding Window Detokenizer
Speech to Speech Translation
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Supported Languages
عربيDeutschEnglishEspañolFrançaisIndonesia日本語한국인РусскийแบบไทยTiếng ViệtPortuguêsPolskiУкраїнська简体中文繁体中文
Supports Daily Life, Travel, and Business
ImmigrantsTravelersBusiness
Supports Multiple Languages
150
Total Supported Languages
150
Training Data Volume
100k+ hours of speech, billions of text tokens