Stable Diffusion

4256 readers

16 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 1 year ago

MODERATORS

db0@lemmy.dbzer0.com

Technical report on SDXL released! (github.com)

submitted 1 year ago by pablonaj@feddit.de to c/stable_diffusion@lemmy.dbzer0.com

2 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Kiwieye@lemmy.world 8 points 1 year ago* (last edited 1 year ago)

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights.