ConsistI2V

We propose ConsistI2V, a diffusion-based method to enhance visual consistency for I2V generation. Specifically, we introduce (1) spatiotemporal attention over the first frame to maintain spatial and motion consistency, (2) noise initialization from the low-frequency band of the first frame to enhance layout consistency. These two approaches enable ConsistI2V to generate highly consistent videos.

Citation

Please kindly cite our paper if you find our code, data, models or results to be helpful.

@article{ren2024consisti2v,
  title={ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation},
  author={Ren, Weiming and Yang, Harry and Zhang, Ge and Wei, Cong and Du, Xinrun and Huang, Stephen and Chen, Wenhu},
  journal={arXiv preprint arXiv:2402.04324},
  year={2024}
}

Acknowledgements

Our codebase is built upon AnimateDiff, FreeInit and 🤗 diffusers. Thanks for open-sourcing.