Cannot Model Conversation Structure
English Language Dominance
Initial Internet Requirement
Memory Bottleneck in Training
No API or Plugin Integration
No Pre-trained Language Model Use
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly
Supported RAM
8GB M1 Macs or higher