adirik/local-prompt-mixing | Run with an API on Replicate

Examples

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Local Prompt Mixing

Local prompt mixing is an image-to-image model which uses Stable Diffusion 1.4. It enables generating variations of an object in an image while preserving other elements in the image. See the original repository, project page or paper for details.

How to use the API

To use Local Prompt Mixing, simply upload an image (.jpg or .png) where you want to modify an object of interest. Provide a simple description of the image (prompt), the name of the object which will be modified. The outputs (variations on the object and the grid image which contains all of them) will be in .jpg format. The API input arguments are as follows:

prompt: a simple description of the image
object_of_interest: the object that you want to generate variations of it
proxy_words: the object(s) that you want to generate instead of object_of_interest
objects_to_preserve: the object(s) that you want to preserve in the image
number_of_variations: the number of auto-generated objects if you didn’t provide any proxy_words
steps: the number of denoising steps for Stable Diffusion
start_prompt_range: nth step where the prompt mixing begins
end_prompt_range: nth step where the prompt mixing ends
guidance_scale: the guidance scale of Stable Diffusion
seed: seed for reproducibility, default value is 10. Set to an arbitrary value for deterministic generation.

Important Notes

Your prompt must contain the word for object_of_interest (i.e. prompt: “a table below a lamp”, object_of_interest: “lamp”), otherwise API will not work properly.
This API has 2 major options for proxy words. If you provide your own words, Stable Diffusion will try to generate variations of the object of interest according to them. If they are sementically closer (i.e. lamp -> light), the performance will be better. If you don’t provide proxy words, API will select words sementically closer to your object of interest. The number of auto generated words are determined by number_of_variations parameter. Thus, please choose one of the approaches.
The parameters, “start_prompt_range” and “end_prompt_range” must be smaller than the parameter, “steps”. Also, “start_prompt_range” must be smaller than “end_prompt_range”.

References

@InProceedings{patashnik2023localizing, author = {Patashnik, Or and Garibi, Daniel and Azuri, Idan and Averbuch-Elor, Hadar and Cohen-Or, Daniel}, title = {Localizing Object-level Shape Variations with Text-to-Image Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year = {2023} }