Customization Options
Voice typePitchSpeedBackground musicAccentSentence breaksVolumeToneStress areas
Device Compatibility
Any device
Emotion Tags
normalslowcryingsleepysighchuckle
Guided Emotion and Intonation
Input Streaming for Lower Latency
LLM-based Customizability
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Pitch Adjustment
Up to 20 semitones higher or lower
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Sample Finetuning Scripts
Sentence Breaks and Punctuation Recognition
Sliding Window Detokenizer
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Stress and Emphasis Control
Training Data Volume
100k+ hours of speech, billions of text tokens
Types of Voices
BasicStandardNeuralCloned
Unlimited Access to Basic Voices
Unlimited Basic Voice-Overs
Use Cases Information
Video Sales LettersEducational VideosMarketing VideosAnimated VideosAudio BooksExplainer VideosPodcastsWebsites
Uses IBM, Azure, Google, Amazon TTS
IBMAzure AIGoogle Text to SpeechAmazon