this post was submitted on 20 Sep 2024
13 points (88.2% liked)

Stable Diffusion

4255 readers
24 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 1 year ago
MODERATORS
 

Abstract

In this work, we introduce OmniGen, a new diffusion model for unified image generation. Unlike popular diffusion models (e.g., Stable Diffusion), OmniGen no longer requires additional modules such as ControlNet or IP-Adapter to process diverse control conditions. OmniGenis characterized by the following features: 1) Unification: OmniGen not only demonstrates text-to-image generation capabilities but also inherently supports other downstream tasks, such as image editing, subject-driven generation, and visual-conditional generation. Additionally, OmniGen can handle classical computer vision tasks by transforming them into image generation tasks, such as edge detection and human pose recognition. 2) Simplicity: The architecture of OmniGen is highly simplified, eliminating the need for additional text encoders. Moreover, it is more user-friendly compared to existing diffusion models, enabling complex tasks to be accomplished through instructions without the need for extra preprocessing steps (e.g., human pose estimation), thereby significantly simplifying the workflow of image generation. 3) Knowledge Transfer: Through learning in a unified format, OmniGen effectively transfers knowledge across different tasks, manages unseen tasks and domains, and exhibits novel capabilities. We also explore the model's reasoning capabilities and potential applications of chain-of-thought mechanism. This work represents the first attempt at a general-purpose image generation model, and there remain several unresolved issues. We will open-source the related resources at this https URL to foster advancements in this field.

Paper: https://arxiv.org/abs/2409.11340

Code: https://github.com/VectorSpaceLab/OmniGen (coming soon)

top 7 comments
sorted by: hot top controversial new old
[–] dysprosium@lemmy.dbzer0.com 5 points 22 hours ago (1 children)

Super cool, but is it also completely self-hostable?

[–] Even_Adder@lemmy.dbzer0.com 6 points 22 hours ago (1 children)

They said they would be open sourcing it.

[–] fhein@lemmy.world 6 points 21 hours ago (1 children)

Only 3.8B parameters according to the paper, so it ought to be quite easy on the hardware as well if they do.

[–] Even_Adder@lemmy.dbzer0.com 7 points 21 hours ago (2 children)

That's kind of unbelievable given what they say it can do.

[–] Yomope@lemmy.ml 2 points 16 hours ago

I must agree. But since they claim “few shot” all along the paper and they publish the failure case ( which I rarely see) I tend to believe that this model can do everything they said BUT not consistently. But this is look like a very promising POC for this kind of models

[–] django@discuss.tchncs.de 2 points 19 hours ago (1 children)

Do they have a schedule for the release?

[–] Even_Adder@lemmy.dbzer0.com 3 points 18 hours ago

Doesn't seem like it.