You're looking at a specific version of this model. Jump to the model overview.

zsxkib /sonic:a2aad29e

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
image
string
Input portrait image (will be cropped if face is detected).
audio
string
Input audio file (WAV, MP3, etc.) for the voice.
dynamic_scale
number
1

Min: 0.5

Max: 2

Controls movement intensity. Increase/decrease for more/less movement.
min_resolution
integer
512

Min: 256

Max: 1024

Minimum image resolution for processing. Lower values use less memory but may reduce quality.
inference_steps
integer
25

Min: 5

Max: 50

Number of diffusion steps. Higher values may improve quality but take longer.
keep_resolution
boolean
False
If true, output video matches the original image resolution. Otherwise uses the min_resolution after cropping.
seed
integer
Random seed for reproducible results. Leave blank for a random seed.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}