Task: Text2Image, Inpainting
We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.
Model | Task | Dataset | Download |
---|---|---|---|
stable_diffusion_xl | Text2Image | - | - |
We use stable diffusion xl weights. This model has several weights including vae, unet and clip.
You may download the weights from stable-diffusion-xl and change the 'from_pretrained' in config to the weights dir.
Running the following codes, you can get a text-generated image.
from mmengine import MODELS, Config
from mmengine.registry import init_default_scope
init_default_scope('mmagic')
config = 'configs/stable_diffusion_xl/stable-diffusion_xl_ddim_denoisingunet.py'
config = Config.fromfile(config).copy()
StableDiffuser = MODELS.build(config.model)
prompt = 'A mecha robot in a favela in expressionist style'
StableDiffuser = StableDiffuser.to('cuda')
image = StableDiffuser.infer(prompt)['samples'][0]
image.save('robot.png')
Our codebase for the stable diffusion models builds heavily on diffusers codebase and the model weights are from stable-diffusion-xl.
Thanks for the efforts of the community!