You're looking at a specific version of this model. Jump to the model overview.

zsxkib /v-express:e0122658

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
reference_image
string
Path to the reference image that will be used as the base for the generated video.
driving_audio
string
Path to the audio file that will be used to drive the motion in the generated video.
use_video_audio
boolean
False
If True and driving_video is provided, use the audio from the driving video instead of the driving_audio.
driving_video
string
Path to the video file that will be used to extract the head motion. If not provided, the generated video will use the motion based on the selected motion_mode.
motion_mode
string (enum)
fast

Options:

standard, gentle, normal, fast

Mode for generating the head motion in the output video.
reference_attention_weight
number
0.95

Max: 1

Amount of attention to pay to the reference image vs. the driving motion. Higher values will make the generated video adhere more closely to the reference image. Range: 0.0 to 1.0
audio_attention_weight
number
3

Max: 10

Amount of attention to pay to the driving audio vs. the reference image. Higher values will make the generated video's motion more closely match the driving audio. Range: 0.0 to 10.0
num_inference_steps
integer
25

Min: 1

Max: 100

Number of diffusion steps to perform during generation. More steps will generally produce better quality results but will take longer to run. Range: 1 to 100
image_width
integer
512

Min: 64

Max: 2048

Width of the generated video frames.
image_height
integer
512

Min: 64

Max: 2048

Height of the generated video frames.
frames_per_second
number
30

Min: 1

Max: 60

Frame rate of the generated video.
guidance_scale
number
3.5

Min: 1

Max: 20

Guidance scale for the diffusion model. Higher values will result in the generated video following the driving motion and audio more closely.
num_context_frames
integer
12

Min: 1

Max: 24

Number of context frames to use for motion estimation.
context_stride
integer
1

Min: 1

Max: 10

Stride of the context frames.
context_overlap
integer
4

Max: 24

Number of overlapping frames between context windows.
num_audio_padding_frames
integer
2

Max: 10

Number of audio frames to pad on each side of the driving audio.
seed
integer
Random seed. Leave blank to randomize the seed

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}