andreasjansson / stable-diffusion-inpainting

Inpainting using RunwayML's stable-diffusion-inpainting checkpoint

  • Public
  • 1.5M runs
  • GitHub
  • License
This model uses a mask. Visit the new playground to try this model with our mask editing tool.

Run time and cost

This model costs approximately $0.019 to run on Replicate, or 52 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 14 seconds.

Readme

Stable Diffusion Inpainting

Checkpoint: https://huggingface.co/runwayml/stable-diffusion-v1-5

Tip: Get a high-quality image mask by using https://replicate.com/arielreplicate/dichotomous_image_segmentation

Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.

The Stable-Diffusion-Inpainting was initialized with the weights of the Stable-Diffusion-v-1-2. First 595k steps regular training, then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.