Cannot Model Conversation Structure
English Language Dominance
Max Text Input Characters
250
Maximum Sound Duration
60
Memory Bottleneck in Training
Minimum Sound Duration
10
No API or Plugin Integration
No Pre-trained Language Model Use
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly