Applicable User Groups
Foreign Trade Sales/PurchasingMultinational company employeesCross-border freelancersInternational studentsTravelers
Emotion Tags
normalslowcryingsleepysighchuckle
End-to-End AI Speech Model
Guided Emotion and Intonation
Input Streaming for Lower Latency
Language Pair Translation
English-JapaneseJapanese-EnglishJapanese-Chinese
List of Supported Languages
60
LLM-based Customizability
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
Multi-Scenario Usage
Online meetingsOffline communicationTravel scenariosBusiness meetingsForeign trade communicationFace-to-face exhibition communicationStudy abroad lecturesTravel meal ordering
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Professional Vocabulary Learning
Real-Time Speech Transcription
Sample Finetuning Scripts
Sliding Window Detokenizer
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Supported Languages
ChineseEnglishJapaneseKoreanCantoneseGermanFrenchRussianItalianSpanishThaiVietnamese
Training Data Volume
100k+ hours of speech, billions of text tokens
Voice Streaming and Interruption