Audio and Video Processing
Audio Metadata Extraction
Daily Audio Processing
2500000
Emotion Tags
normalslowcryingsleepysighchuckle
Energy Classification
high
Genre Classification
rock
Guided Emotion and Intonation
Higher SDR Performance
15.8% higher average SDR
Input Streaming for Lower Latency
Instrument Types Supported
bassGuitarpercussionelectricGuitar
Large Scale Audio Processing
1000000000
LLM-based Customizability
Lyric & Speech Transcription
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
Mood Classification
energetic
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Sample Finetuning Scripts
Sliding Window Detokenizer
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Training Data Volume
100k+ hours of speech, billions of text tokens
Translation & Localization