Readme
Implementation of stabilityai/stable-diffusion-x4-upscaler
About
This model card focuses on the model associated with the Stable Diffusion Upscaler, available here. This model is trained for 1.25M steps on a 10M subset of LAION containing images >2048x2048. The model was trained on crops of size 512x512 and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a noise_level as an input parameter, which can be used to add noise to the low-resolution input according to a [predefined diffusion schedule].
Model Details
-
Developed by: Robin Rombach, Patrick Esser
-
Model type: Diffusion-based text-to-image generation model
-
Language(s): English
-
License: CreativeML Open RAIL++-M License
-
Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (OpenCLIP-ViT/H).
-
Resources for more information: GitHub Repository.
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}