BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
This is the PyTorch code of the BLIP paper.
Citation
If you find this code to be useful for your research, please consider citing.
@misc{li2022blip, title={BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation}, author={Junnan Li and Dongxu Li and Caiming Xiong and Steven Hoi}, year={2022}, eprint={2201.12086}, archivePrefix={arXiv}, primaryClass={cs.CV} }
Acknowledgement
The implementation of BLIP relies on resources from ALBEF, Huggingface Transformers, and timm. We thank the original authors for their open-sourcing.