dessix/moss-ttsd

MOSS-TTSD (text to spoken dialogue) is an open-source bilingual spoken dialogue synthesis model that supports both Chinese and English. It can transform dialogue scripts between two speakers into natural, expressive conversational speech.

Public
570 runs

Run time and cost

This model costs approximately $0.13 to run on Replicate, or 7 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia H100 GPU hardware. Predictions typically complete within 85 seconds. The predict time for this model varies significantly based on the inputs.

Readme

MOSS-TTSD 🪐

Overview

MOSS-TTSD (text to spoken dialogue) is an open-source bilingual spoken dialogue synthesis model that supports both Chinese and English. It can transform dialogue scripts between two speakers into natural, expressive conversational speech. MOSS-TTSD supports voice cloning and single-session speech generation of up to 960 seconds, making it ideal for AI podcast production.

Highlights

  • Highly Expressive Dialogue Speech: Built on unified semantic-acoustic neural audio codec, a pre-trained large language model, millions of hours of TTS data, and 400k hours synthetic and real conversational speech, MOSS-TTSD generates highly expressive, human-like dialogue speech with natural conversational prosody.
  • Two-Speaker Voice Cloning: MOSS-TTSD supports zero-shot two speakers voice cloning and can generate conversational speech with accurate speaker swithcing based on dialogue scripts.
  • Chinese-English Bilingual Support: MOSS-TTSD enables highly expressive speech generation in both Chinese and English.
  • Long-Form Speech Generation (up to 960 seconds): Thanks to low-bitrate codec and training framework optimization, MOSS-TTSD has been trained for long speech generation, enabling single-session speech generation of up to 960 seconds.
  • Fully Open Source & Commercial-Ready: MOSS-TTSD and its future updates will be fully open-source and support free commercial use.