jichengdu / fish-speech

Fish Speech V1.5-SOTA Open Source TTS

  • Public
  • 250 runs
  • GitHub
  • Weights
  • License

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Fish Speech V1.5 The 1.5 version of the model released by fish.audio.

Disclaimer This is an unofficial implementation. Please refer to the fishspeech repository for the original code and details. By using this model, you agree to the terms stated at the link above, which could change at any time. Make sure to comply to these terms before using the model.

FishSpeech: Advanced Speech Synthesis Technology

Key Features

Zero-shot & Few-shot TTS

  • Generate high-quality speech output that closely resembles the original voice with just 10-30 seconds of audio samples
  • Quickly achieve personalized voice cloning without extensive training data

Excellent Bilingual Support

  • Perfect support for both Chinese and English with seamless language switching
  • Simply copy and paste Chinese or English text into the input box for automatic processing
  • Natural and fluent reading of mixed Chinese-English text without additional settings

No Phoneme Dependency

  • Revolutionary technological breakthrough that completely eliminates traditional TTS dependency on phonemes
  • Model possesses powerful language understanding and generalization capabilities
  • Directly processes text without complex phoneme conversion procedures

Superior Accuracy Performance

  • Achieves approximately 2% Character Error Rate (CER) and Word Error Rate (WER) in 5-minute English text tests
  • Equally outstanding accuracy for Chinese text comprehension with clear and natural pronunciation
  • Significantly reduces pronunciation errors and unnatural pauses common in traditional TTS systems

Multi-scenario Applications

  • Personalized voice assistant customization
  • Audiobook and podcast production
  • Video dubbing and game character voices
  • Educational and assistive technology applications

FishSpeech: 先进的语音合成技术

主要特点

零样本 & 小样本 TTS

  • 只需提供10至30秒的声音样本,即可生成与原声音高度相似的高质量语音输出
  • 无需大量训练数据,快速实现个性化语音克隆

双语卓越支持

  • 完美支持中文和英文,无缝切换两种语言
  • 只需复制并粘贴中英文本到输入框,系统自动识别并处理
  • 中英混合文本也能自然流畅地朗读,无需额外设置

无音素依赖设计

  • 革命性技术突破,完全摆脱传统TTS对音素的依赖
  • 模型具备强大的语言理解和泛化能力
  • 能够直接处理文本,无需复杂的音素转换过程

超高准确率表现

  • 在5分钟英文文本测试中,达到约2%的字符错误率(CER)和词错误率(WER)
  • 中文文本理解准确率同样出色,发音清晰自然
  • 大幅减少传统TTS系统常见的发音错误和不自然停顿

多场景应用

  • 个性化语音助手定制
  • 有声读物和播客制作
  • 视频配音和游戏角色声音
  • 教育和辅助技术应用