zsxkib / dream-o

๐Ÿ‘—Bytedance's DreamO: unified image customization model (IP, ID, Style, Try-On, etc.)๐Ÿงฃ

  • Public
  • 795 runs
  • GitHub
  • Weights
  • Paper
  • License
Iterate in playground

Run time and cost

This model costs approximately $0.33 to run on Replicate, or 3 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

DreamO: Unified Image Customization ๐ŸŽจ (Cog Implementation)

Replicate

This Replicate model runs DreamO, a unified framework for image customization developed by Bytedance. It excels at tasks like subject-driven generation (IP-Adapter/PuLID style), virtual try-on, and style transfer, leveraging the FLUX.1-dev model as its backbone.

Original Project (GitHub): bytedance/DreamO arXiv Paper: 2504.16915: DreamO: A Unified Framework for Image Customization Core HF Weights: black-forest-labs/FLUX.1-dev (DreamO Pipeline) & PramaLLC/BEN2 (Background Removal)


About the DreamO Model

DreamO is a powerful image customization framework designed to handle a variety of conditioning inputs simultaneously. By leveraging VAE-based feature encoding and a novel feature routing constraint, DreamO can effectively mitigate conflicts and entanglement among multiple entities or style conditions. This allows for high-fidelity generation across different tasks such as character/object insertion (IP), face identity preservation (ID), virtual try-on, and style application.

Key Features & Capabilities โœจ

  • IP (Identity Preservation - General) ๐Ÿ–ผ๏ธ: Similar to IP-Adapter, supports a wide range of inputs including characters, objects, and animals. Achieves high fidelity in preserving entity identity.
  • ID (Identity Preservation - Face) ๐Ÿ‘ฉ: Focuses specifically on facial identity, similar to InstantID and PuLID.
  • Try-On ๐Ÿ‘š๐Ÿ‘’: Supports virtual try-on for items like tops, bottoms, glasses, and hats, even with multiple garments (a capability generalized from its training).
  • Style Transfer ๐ŸŽจ: Applies the style of a reference image to a new generation. (Note: Currently less stable than other tasks and cannot be combined with other conditions in the original implementation).
  • Multi-Condition Generation โž•: Can combine multiple conditions (e.g., ID + IP, multiple IPs) to generate more creative and complex images, effectively managing potential conflicts between conditions.

Underlying Technologies & Concepts ๐Ÿ”ฌ

  • FLUX Backbone: Leverages the powerful FLUX.1-dev text-to-image model. DreamO uses FLUX-turbo LoRA by default for faster inference.
  • VAE-based Feature Encoding: Utilized for encoding reference images to capture high-fidelity details.
  • Feature Routing Constraint: A key proposal in the DreamO paper to mitigate conflicts and entanglement when multiple conditions are applied.

Use Cases ๐Ÿ’ก

  • Creating personalized avatars or character portraits with specific facial identities.
  • Generating images of objects or characters in new scenes or styles.
  • Virtually trying on clothing or accessories.
  • Applying artistic styles from one image to another.
  • Combining multiple reference subjects or styles into a single cohesive image.

Limitations โš ๏ธ

  • Style Task Stability: As noted in the original repository, style consistency is currently less stable compared to other tasks, and in the current version, style cannot be combined with other conditions.
  • ID Task Nuances: While DreamO achieves high facial fidelity for ID tasks, the original paper notes it may introduce more model contamination compared to SOTA approaches like PuLID. Lowering guidance can sometimes help with “glossy” faces.
  • Resource Intensive: Requires a capable GPU (Nvidia A100 80GB on Replicate).

License & Disclaimer ๐Ÿ“œ

The original DreamO project is licensed under the Apache-2.0 License. See the LICENSE file in the original repository.

Disclaimer (from bytedance/DreamO): This project strives to impact the domain of AI-driven image generation positively. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.

This Replicate endpoint is provided for experimentation based on the original work. Users must adhere to the original license and disclaimer.

Citation ๐Ÿ“š

If you find DreamO useful for your research, please consider citing their paper:

@misc{wu2025dreamo,
      title={DreamO: A Unified Framework for Image Customization}, 
      author={Yanze Wu and Yutong Feng and Difan Liu and Jiarui Sabir IARIVOAHY and Zicheng Liu and Qiang Wen and Yuedong Yang and Ming-Hsuan Yang and Chong Mou},
      year={2025},
      eprint={2504.16915},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Cog implementation managed by zsxkib.

Star the original repo on GitHub: bytedance/DreamO โญ

Follow me on Twitter/X