Readme
CosyVoice 2.0-0.5B
Multilingual Support
- Supported Languages: Chinese, English, Japanese, Korean, Chinese dialects (Cantonese, Sichuanese, Shanghainese, Tianjinese, Wuhanese, etc.)
- Cross-lingual & Mixed-lingual: Supports zero-shot voice cloning for cross-language and code-switching scenarios.
Ultra-Low Latency
- Bidirectional Streaming Support: CosyVoice 2.0 integrates offline and streaming modeling technologies.
- Rapid First Packet Synthesis: Achieves latency as low as 150ms while maintaining high-quality audio output.
High Accuracy
- Improved Pronunciation: Reduces pronunciation errors by 30% to 50% compared to CosyVoice 1.0.
- Benchmark Achievements: Attains the lowest character error rate on the hard test set of the Seed-TTS evaluation set.
Strong Stability
- Timbre Consistency: Ensures reliable voice consistency for zero-shot and cross-language speech synthesis.
- Cross-language Synthesis: Shows significant improvements compared to version 1.0.
Natural Experience
- Enhanced Prosody and Sound Quality: Improved alignment of synthesized audio, raising MOS evaluation scores from 5.4 to 5.53.
- Emotional and Dialectal Flexibility: Now supports more granular emotional controls and accent adjustments.
多语言支持
- 支持的语言:中文、英语、日语、韩语、中国方言(粤语、四川话、上海话、天津话、武汉话等)
- 跨语言与混合语言:支持跨语言和代码切换场景下的零样本声音克隆。
超低延迟
- 双向流式支持:CosyVoice 2.0 集成了离线和流式建模技术。
- 快速首包合成:在保持高质量音频输出的同时,实现低至 150ms 的延迟。
高精度
- 改进的发音:与 CosyVoice 1.0 相比,发音错误减少了 30% 到 50%。
- 基准测试成就:在 Seed-TTS 评估集的困难测试集上获得最低字符错误率。
强大的稳定性
- 音色一致性:确保零样本和跨语言语音合成的可靠声音一致性。
- 跨语言合成:与 1.0 版本相比有显著改进。
自然体验
- 增强的韵律和音质:改进了合成音频的对齐,将 MOS 评估分数从 5.4 提高到 5.53。
- 情感和方言灵活性:现在支持更细粒度的情感控制和口音调整。 来一段中英双语的 英语在前