Cannot Model Conversation Structure
English Language Dominance
Maximum Usage Without Payment
50 characters for TTS, 5 minutes for STT
Memory Bottleneck in Training
No Pre-trained Language Model Use
Premium Voices (Enterprise)
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly
Transcription Limit Free Tier
5