zsxkib / v-express

🫦 Realistic facial expression manipulation (lip-syncing) using audio or video

  • Public
  • 398 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware.

Readme

🎥 V-Express: Create Amazing Talking Portrait Videos

Follow me on X @zsakib_ for more AI projects and updates!

🌟 Bring Photos to Life with Talking Videos

V-Express is an amazing AI tool that can turn a single photo into a lifelike talking video. It’s like magic! You can create videos that look and sound just like the person in the picture.

🎭 Unleash Your Creativity

  • Realistic Results: V-Express makes videos that look super real, with mouth movements and facial expressions that match the audio perfectly.
  • Easy to Use: Just give V-Express a photo, an audio clip, and a pose sequence, and it will create an awesome video for you.
  • High-Quality Videos: Our special training method makes sure the videos are top-notch quality.

🎨 Lots of Cool Ways to Use V-Express

You can use V-Express in different ways:

  1. Same Person, Different Scene: Make a talking video that looks like a given video of the same person in a different place.
  2. Still Photo + Audio: Create a video where the person in a still photo talks using any audio you provide.
  3. Mix and Match: Make a video where one person’s movements match another person’s video, and their lips sync with the audio.

🛠️ Try V-Express on Replicate

You can easily make your own talking videos with V-Express on Replicate. Here’s what you need:

  • reference_image: A photo that will be used as the base for the video.
  • driving_audio: An audio clip that will be used to create the talking motion in the video.
  • use_video_audio: If you provide a driving_video, you can choose to use its audio instead of the driving_audio.
  • driving_video: A video that will be used to create the head motion in the generated video. If not provided, the motion will be based on the motion_mode you choose.
  • motion_mode: Choose how fast or slow the head motion should be in the video. You can pick from “standard”, “gentle”, “normal”, or “fast”.
  • reference_attention_weight: Decide how much the generated video should look like the reference image. A higher value means it will look more like the photo.
  • audio_attention_weight: Choose how much the video’s motion should match the driving audio. A higher value means the motion will match the audio more closely.
  • num_inference_steps: The number of steps V-Express takes to create the video. More steps usually mean better quality, but it will take longer.
  • image_width and image_height: The size of the generated video frames.
  • frames_per_second: The frame rate of the generated video.
  • guidance_scale: A setting that controls how closely the video follows the driving motion and audio. A higher value means it will follow them more closely.
  • num_context_frames, context_stride, and context_overlap: Advanced settings for motion estimation. You can leave these at their default values.
  • num_audio_padding_frames: The number of extra audio frames to use at the start and end of the driving audio.
  • seed: A random number that controls the video generation. If you leave it blank, V-Express will pick a random number for you.

Get ready to be amazed by the power of V-Express and create incredible talking videos! 🎉✨

⚠️ Important Things to Keep in Mind

  • V-Express is a powerful tool that can create videos that look very real. Please use it responsibly and follow all the rules.
  • Don’t use the videos for bad things like spreading fake news or tricking people.
  • Respect people’s privacy and rights. Make sure you have permission before using someone’s photo.
  • The creators of V-Express are not responsible if someone uses the tool in a bad way.

By using V-Express, you promise to use it in a good and responsible way. Let’s make amazing videos while being kind and respectful to everyone! 🙌

✍️ Citation

@article{wang2024V-Express,
  title={V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation},
  author={Wang, Cong and Tian, Kuan and Zhang, Jun and Guan, Yonghang and Luo, Feng and Shen, Fei and Jiang, Zhiwei and Gu, Qing and Han, Xiao and Yang, Wei},
  booktitle={arXiv preprint arXiv:2406.02511},
  year={2024}
}