Audio Duration Limit
20 minutes (standard), 2 hours (Pro)
Cannot Model Conversation Structure
English Language Dominance
Interactive Demo Recording Limit
1 minute
Internet Connection Required
Memory Bottleneck in Training
No Pre-trained Language Model Use
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly