jichengdu/llasa:e159ffbd – Run with an API on Replicate

Version

You're looking at a specific version of this model. Jump to the model overview.

jichengdu /llasa:e159ffbd

Playground API Setup logs

Input

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

voice_sample

*file

Voice sample audio file (16kHz)

text

*string

Shift + Return to add a new line

Text to convert to speech

prompt_text

string

Shift + Return to add a new line

Optional prompt text. If not provided, will be extracted from voice sample using Whisper

Run this model in Node.js with one line of code:

npx create-replicate --model=jichengdu/llasa

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run jichengdu/llasa using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "jichengdu/llasa:e159ffbd476eaad8ddc3d05b73074a618a32a0aa4efb2e652aba0268ef506f37",
  {
    input: {
      text: "为所有的猫猫奋斗终身！",
      voice_sample: "https://replicate.delivery/pbxt/MiFpnTHt7iIQ8LELP7yEKUvk1yO3HZwz9NquUVpOQ7SNPa74/zero_shot_prompt.wav"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run jichengdu/llasa using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "jichengdu/llasa:e159ffbd476eaad8ddc3d05b73074a618a32a0aa4efb2e652aba0268ef506f37",
    input={
        "text": "为所有的猫猫奋斗终身！",
        "voice_sample": "https://replicate.delivery/pbxt/MiFpnTHt7iIQ8LELP7yEKUvk1yO3HZwz9NquUVpOQ7SNPa74/zero_shot_prompt.wav"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run jichengdu/llasa using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "e159ffbd476eaad8ddc3d05b73074a618a32a0aa4efb2e652aba0268ef506f37",
    "input": {
      "text": "为所有的猫猫奋斗终身！",
      "voice_sample": "https://replicate.delivery/pbxt/MiFpnTHt7iIQ8LELP7yEKUvk1yO3HZwz9NquUVpOQ7SNPa74/zero_shot_prompt.wav"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/jichengdu/llasa@sha256:e159ffbd476eaad8ddc3d05b73074a618a32a0aa4efb2e652aba0268ef506f37 \
  -i 'text="为所有的猫猫奋斗终身！"' \
  -i 'voice_sample="https://replicate.delivery/pbxt/MiFpnTHt7iIQ8LELP7yEKUvk1yO3HZwz9NquUVpOQ7SNPa74/zero_shot_prompt.wav"'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/jichengdu/llasa@sha256:e159ffbd476eaad8ddc3d05b73074a618a32a0aa4efb2e652aba0268ef506f37
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "text": "为所有的猫猫奋斗终身！",
      "voice_sample": "https://replicate.delivery/pbxt/MiFpnTHt7iIQ8LELP7yEKUvk1yO3HZwz9NquUVpOQ7SNPa74/zero_shot_prompt.wav"
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

{
  "completed_at": "2025-03-24T13:44:11.898278Z",
  "created_at": "2025-03-24T13:40:26.986000Z",
  "data_removed": false,
  "error": null,
  "id": "1kqy6vrrx9rme0cns1h8bttbnc",
  "input": {
    "text": "为所有的猫猫奋斗终身！",
    "voice_sample": "https://replicate.delivery/pbxt/MiFpnTHt7iIQ8LELP7yEKUvk1yO3HZwz9NquUVpOQ7SNPa74/zero_shot_prompt.wav"
  },
  "logs": "/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py:496: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.\nwarnings.warn(\nDue to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.\nPassing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.43.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.\nWhisper transcription: 希望你以后能够做得比我还好哟\nPrompt Vq Code Shape: torch.Size([1, 1, 175])\nThe attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nSetting `pad_token_id` to `eos_token_id`:None for open-end generation.\nThe attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\nStarting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)",
  "metrics": {
    "predict_time": 6.727991584,
    "total_time": 224.912278
  },
  "output": "https://replicate.delivery/xezq/GaVxDJMlWdqPFdeERPPGEZL7uz4HlfwEd2XGdbt6NGZrOxbUA/output.wav",
  "started_at": "2025-03-24T13:44:05.170286Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/bcwr-rplxii7pg5ffqfbzl4re7lwa2ms2ifqektavidmhyjdevaqyshaq",
    "get": "https://api.replicate.com/v1/predictions/1kqy6vrrx9rme0cns1h8bttbnc",
    "cancel": "https://api.replicate.com/v1/predictions/1kqy6vrrx9rme0cns1h8bttbnc/cancel"
  },
  "version": "e159ffbd476eaad8ddc3d05b73074a618a32a0aa4efb2e652aba0268ef506f37"
}

Generated in

6.7 seconds

Tweak it ShareReport