Troubleshooting
The checkpoints support English and Chinese.
If you’re having issues, try converting your reference audio to WAV or MP3 and clipping it to 15s.
Credits
Used code from here for batching: - https://github.com/nivibilla/local-llasa-tts Model card: - https://huggingface.co/HKUSTAudio/Llasa-3B
Model Information
Our model, Llasa, is a text-to-speech (TTS) system that extends the text-based LLaMA (1B,3B, and 8B) language model by incorporating speech tokens from the XCodec2 codebook, which contains 65,536 tokens. We trained Llasa on a dataset comprising 250,000 hours of Chinese-English speech data. The model is capable of generating speech either solely from input text or by utilizing a given speech prompt.
Disclaimer
This model is licensed under the CC BY-NC-ND 4.0 License, which prohibits commercial use; detected valiations will result in legal consequences.
This codebase is strictly prohibited from being used for any illegal purposes in any country or region. Please refer to your local laws about DMCA and other related laws.