Documentation Video Automation
Emotion Tags
normalslowcryingsleepysighchuckle
Guided Emotion and Intonation
Images and Audio to Video
Industry Use Cases
AnnouncementsPodcastsLanguage LessonsAudiobooksVideo LectureProperty RentalCorporate TrainingScreencasts
Input Streaming for Lower Latency
LLM-based Customizability
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Sample Finetuning Scripts
Scripted Stage Directions
Sliding Window Detokenizer
Social Media Templates
InstagramLinkedInFacebookTwitter
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Supported Language List
100
Training Data Volume
100k+ hours of speech, billions of text tokens
Voiceover Synchronization