cjwbw / textdiffuser

Diffusion Models as Text Painters

  • Public
  • 1.7K runs
  • GitHub
  • Paper
  • License

TextDiffuser: Diffusion Models as Text Painters

TextDiffuser generates images with visually appealing text that is coherent with backgrounds. It is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.

Highlights

  • We propose TextDiffuser, which is a two-stage diffusion-based framework for text rendering. It generates accurate and coherent text images from text prompts or additionally with template images, as well as conducting text inpainting to reconstruct incomplete images.

  • We release MARIO-10M, containing large-scale image-text pairs with OCR annotations, including text recognition, detection, and character-level segmentation masks. (To be released)

Acknowledgement

We sincerely thank the following projects: Hugging Face Diffuser, LAION, DB, PARSeq, img2dataset.

Also, special thanks to the open-source diffusion project or available demo: DALLE, Stable Diffusion, Stable Diffusion XL, Midjourney, ControlNet, DeepFloyd.

Contact

For help or issues using TextDiffuser, please email Jingye Chen (qwerty.chen@connect.ust.hk), Yupan Huang (huangyp28@mail2.sysu.edu.cn) or submit a GitHub issue.

For other communications related to TextDiffuser, please contact Lei Cui (lecu@microsoft.com) or Furu Wei (fuwei@microsoft.com).

Citation

If you find this code useful in your research, please consider citing:

@article{chen2023textdiffuser,
  title={TextDiffuser: Diffusion Models as Text Painters},
  author={Chen, Jingye and Huang, Yupan and Lv, Tengchao and Cui, Lei and Chen, Qifeng and Wei, Furu},
  journal={arXiv preprint arXiv:2305.10855},
  year={2023}
}