Advanced Vocal Editing
PronunciationPitchVibratoBreathingFalsettoTensionStrengthEmotion
AI Singing Voice Generation
Emotion Tags
normalslowcryingsleepysighchuckle
Genre Variety
PopSoulLatinoCinematicOperaChild VoiceHip hopBalladR&BLatin PopR&B/FunkSoul/FunkLatin Folk
Guided Emotion and Intonation
Highly Editable AI Vocals
Input Streaming for Lower Latency
Language/Country Support
EnglishSpanishChineseJapanese
LLM-based Customizability
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
No Custom Model Training for VoiceMix
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Royalty-Free Commercial Use
Sample Finetuning Scripts
Sliding Window Detokenizer
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Training Data Volume
100k+ hours of speech, billions of text tokens
Voice Designer / VoiceMix