zsxkib/v-express:2819249e | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

zsxkib /v-express:2819249e

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
reference_image	string		Path to the reference image that will be used as the base for the generated video.
driving_audio	string		Path to the audio file that will be used to drive the motion in the generated video.
use_video_audio	boolean	False	If True and driving_video is provided, use the audio from the driving video instead of the driving_audio.
driving_video	string		Path to the video file that will be used to extract the head motion. If not provided, the generated video will use the motion based on the selected motion_mode.
motion_mode	string (enum)	fast Options: standard, gentle, normal, fast	Mode for generating the head motion in the output video.
reference_attention_weight	number	0.95 Max: 1	Amount of attention to pay to the reference image vs. the driving motion. Higher values will make the generated video adhere more closely to the reference image. Range: 0.0 to 1.0
audio_attention_weight	number	3 Max: 10	Amount of attention to pay to the driving audio vs. the reference image. Higher values will make the generated video's motion more closely match the driving audio. Range: 0.0 to 10.0
num_inference_steps	integer	25 Min: 1 Max: 100	Number of diffusion steps to perform during generation. More steps will generally produce better quality results but will take longer to run. Range: 1 to 100
image_width	integer	512 Min: 64 Max: 2048	Width of the generated video frames.
image_height	integer	512 Min: 64 Max: 2048	Height of the generated video frames.
frames_per_second	number	30 Min: 1 Max: 60	Frame rate of the generated video.
guidance_scale	number	3.5 Min: 1 Max: 20	Guidance scale for the diffusion model. Higher values will result in the generated video following the driving motion and audio more closely.
num_context_frames	integer	12 Min: 1 Max: 24	Number of context frames to use for motion estimation.
context_stride	integer	1 Min: 1 Max: 10	Stride of the context frames.
context_overlap	integer	4 Max: 24	Number of overlapping frames between context windows.
num_audio_padding_frames	integer	2 Max: 10	Number of audio frames to pad on each side of the driving audio.
seed	integer		Random seed. Leave blank to randomize the seed

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{'format': 'uri', 'title': 'Output', 'type': 'string'}