Cannot Model Conversation Structure
English Language Dominance
Maximum Audio Length per File
10 hours
Maximum Upload Duration Per File
10 hours
Memory Bottleneck in Training
No Pre-trained Language Model Use
Premium Required for Export
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly