Canopy Labs vs Sesame

Comparing the features of Canopy Labs to Sesame

Feature
Canopy Labs
Sesame

Capability Features

Consistent Personality
Context Awareness
Conversational Dynamics
Conversational Speech Generation
Dataset Size
1 million hours
Demo Availability
Emotion Tags
normalslowcryingsleepysighchuckle
Emotional Intelligence
Evaluation Suite
Guided Emotion and Intonation
Handles Disfluencies
Input Streaming for Lower Latency
Llama Architecture
Llama
LLM-based Customizability
Model Sizes
Tiny: 1B backbone, 100M decoderSmall: 3B backbone, 250M decoderMedium: 8B backbone, 300M decoder
Model Tokenizer Type
Non-streaming (CNN-based) tokenizer
Multiple Speaker Handling
Objective Metrics
Word Error RateSpeaker SimilarityHomograph DisambiguationPronunciation Consistency
Open Source Release Planned
Orpheus Speech Models
Medium (3B)Small (1B)Tiny (400M)Nano (150M)
Partial Multilingual Support Planned
Planned for 20+ languages
Pretrained and Finetuned Models
Pretrained modelsFinetuned models
Pronunciation Correction
Realtime Streaming
Sample Finetuning Scripts
Sequence Length
2048
Single-Stage Model
Sliding Window Detokenizer
Streaming Inference Speed
Faster than playback on A100 40GB for 3B model
Subjective Metrics
Comparative Mean Opinion Score
Text and Audio Input
TextAudio
Text to Speech
Training Data Volume
100k+ hours of speech, billions of text tokens
Training Epochs
5
Zero-Shot Voice Cloning

Integration Features

Baseten 1-Click Deployment
GitHub Release
GitHub Repository Access
Google Colab Notebook
Hugging Face Model Access
LLama Architecture Backbone
LLama Ecosystem Support
Mimi Split-RVQ Tokenizer
Python Package for Streaming

Limitation Features

Cannot Model Conversation Structure
English Language Dominance
English Language Only
Memory Bottleneck in Training
No API Mentioned
No Explicit Pricing Details
No Mention of File Format Support
No Pre-trained Language Model Use
Real-Time Generation Delay
RVQ time-to-first-audio scales poorly

Pricing Features

Free Preview
Open Source
Apache 2.0