Cannot Model Conversation Structure
Device Requirements
Not supported on iPhone SE 2nd gen; 8GB+ RAM recommended for macOS
English Language Dominance
Memory Bottleneck in Training
No Background Transcription on iOS
No Pre-trained Language Model Use
No Real-time Transcription
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly