Extended video synthesis model that generates 128 frames
Audio-based Lip Synchronization for Talking Head Video
Updated to OpenVoice v2: Versatile Instant Voice Cloning
A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Fast sdxl with higher quality
Convert LLM's coding to image generation
Depth estimation with faster inference speed, fewer parameters, and higher depth accuracy.
CogVLM2: Visual Language Models for Image and Video Understanding
Generating Consistent Long Depth Sequences for Open-world Videos
Finer and Faster Text-to-Image Generation via Relay Diffusion
Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Sharp Monocular Metric Depth in Less Than a Second
Efficient Visual Generation with Hybrid Autoregressive Transformer
Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Depth Any Video with Scalable Synthetic Data
DiT-based video generation model for generating high-quality videos in real-time
Minimal and Universal Control for Diffusion Transformer - demo for Subject-driven generation
High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Autoregressive Video Generation without Vector Quantization
Autoregressive Image Generation without Vector Quantization
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
This model runs on A100 (80GB). View more.