Ambient Sounds Support
TypingNoisy RestaurantDoorbell RingTelevision PlayingCookingStreet
Emotion Tags
normalslowcryingsleepysighchuckle
Guided Emotion and Intonation
Human Sound Effects
Baby LaughingClappingCelebrateFootstepsBurpingChattering
Input Streaming for Lower Latency
Instrument Sounds
PianoElectric GuitarViolinIrish Uilleann PipesElectric KeyboardBacking Track
LLM-based Customizability
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
Nature Sound Effects
RainOcean Waves and BirdFlowing WaterInsect ChirpingThunder and LightningDog Barking
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Popular Sound Effect Categories
CicadaFartExplosionVine BoomMetal PipeFunnyRainDingRizzScreech OwlBleepBaby CryingGunshotPhone RingingUwuAlarmDun Dun DunCricketWindPopAmbientApplauseBoingScreamThunderWomp WompWhooshGoatOceanPunchDrum RollCartoonBellBonkMosquitosWhistle
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Sample Finetuning Scripts
Sliding Window Detokenizer
Special Effects Support
FireworksGlass ShatteringMagicSpaceshipActionGunshot
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Text-to-Speech Generation
Training Data Volume
100k+ hours of speech, billions of text tokens
Use Cases Information
Video MakingGame MakingMusic ProductionVirtual RealityMeditation AppsCinemaPodcastsLive Performances