Cannot Model Conversation Structure
English Language Dominance
File Format Support
Not specified
Memory Bottleneck in Training
No Pre-trained Language Model Use
Pricing Plans
Not specified
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly