Lipsync 2 | Sync Labs | Readme and Docs

Introducing Lipsync 2.0

Lipsync 2.0 is a zero-shot model for generating realistic lip movements that match spoken audio. It works out of the box—no training or fine-tuning needed—and preserves a speaker’s unique style across different languages and video types. Whether you’re working with live-action footage, animation, or AI-generated characters, Lipsync 2.0 brings new levels of realism, control, and speed.

What it does

Zero-shot: No waiting around for training. Just drop in your video and audio—Lipsync 2.0 handles the rest.

Style preservation: The model picks up on how someone speaks by watching them speak. Even when translating across languages, it keeps their signature delivery.

Cross-domain support: Works with live-action humans, animated characters, and AI-generated faces.

Flexible workflows: Use it for dubbing, editing words in post, or reanimating entire performances.

Key features Temperature control Fine-tune how expressive the lipsync is. Make it subtle or dial it up depending on the scene.

Active speaker detection Automatically detects who’s speaking in multi-person videos and applies lipsync only when that person is talking.

Flawless animation Handles everything from stylized 3D characters to hyperreal AI avatars. Not just for translation—this unlocks editable dialogue in post-production.

Record once, edit forever You don’t need multiple takes. Change dialogue after the fact while keeping the original speaker’s delivery intact.

Dub any video with AI If you can generate a video with text, you can dub it too. No need to capture everything on camera anymore.