kuprel / min-dalle

Fast, minimal port of DALL·E Mini to PyTorch

  • Public
  • 503K runs
  • GitHub
  • License

Input

Output

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 77 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Colab

Input Parameter Descriptions

Basic

  • text: For long prompts, only the first 64 tokens will be used to generate the image.
  • save_as_png: If selected, the image is saved in lossless png format, otherwise jpg.
  • progressive_outputs: Show intermediate outputs while running. This adds less than a second to the run time.
  • seamless: Tile images in token space instead of pixel space. This has the effect of blending the images at the borders.
  • grid_size: Size of the image grid. 5x5 takes about 15 seconds, 9x9 takes about 40 seconds.

Advanced

  • temperature: High temperature increases the probability of sampling low scoring image tokens.
  • top_k: Each image token is sampled from the top-k scoring tokens.

Increasing temperature and/or top_k will increase variety in the generated images at the expense of the images being less coherent. Setting temperature high and top_k low can result in more variety without sacrificing coherence.

Expert

  • supercondition_factor: Higher values can result in better agreement with the text. Let logits_cond be the logits computed from the text prompt and logits_uncond be the logits computed from an empty text prompt, and let a be the super-condition factor, then logits = logits_cond * a + logits_uncond * (1 - a)

Example

Consider the images generated for “panda with top hat reading a book” with different settings.

text = "panda with top hat reading a book"
temperature = 0.5
top_k = 128
supercondition_factor = 4

min-dalle

text = "panda with top hat reading a book"
temperature = 4
top_k = 64
supercondition_factor = 16

min-dalle