The repository is based on our recently released survey Conditional Image Synthesis with Diffusion Models: A Survey
Zheyuan Zhan, Defang Chen, Jian-Ping Mei, Zhenghe Zhao, Jiawei Chen, Chun Chen, Siwei Lyu, Fellow, IEEE and Can Wang
Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches in the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the essential sampling process. All discussions are centered around popular applications. Finally, we pinpoint some critical yet still open problems to be solved in the future and suggest some possible solutions.
📆2024-10-05: Our comprehensive survey paper, summarizing related methods published before October 1, 2024, is now available.
@article{zhan2024conditional,
title={Conditional Image Synthesis with Diffusion Models: A Survey},
author={Zhan, Zheyuan and Chen, Defang and Mei, Jian-Ping and Zhao, Zhenghe and Chen, Jiawei and Chen, Chun and Lyu, Siwei and Wang, Can},
journal={arXiv preprint arXiv:2409.19365},
year={2024}
}
- Overview
- Papers
In the two figures below, they respectively illustrate the DCIS taxonomy in this survey and the categorization of conditional image synthesis tasks.
The date in the table represents the publication date of the first version of the paper on Arxiv.
This figure provides an examplar workflow to build desired denoising network for conditional synthesis tasks including text-to-image, visual signals to image and customization via these three condition integration stages.
Title | Task | Date | Publication |
---|---|---|---|
Vector quantized diffusion model for text-to-image synthesis | Text-to-image | 2021.11 | CVPR2022 |
High-resolution image synthesis with latent diffusion models | Text-to-image | 2021.12 | CVPR2022 |
GLIDE: towards photorealistic image generation and editing with text-guided diffusion models | Text-to-image | 2021.12 | ICML2022 |
Hierarchical text-conditional image generation with CLIP latents | Text-to-image | 2022.4 | ARXIV2022 |
Photorealistic text-to-image diffusion models with deep language understanding | Text-to-image | 2022.5 | NeurIPS2022 |
ediffi: Text-to-image diffusion models with an ensemble of expert denoisers | Text-to-image | 2022.11 | ARXIV2022 |
Title | Task | Date | Publication |
---|---|---|---|
GLIGEN: open-set grounded text-to-image generation | Layout control | 2023.1 | CVPR2023 |
Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation | Customization | 2023.2 | CVPR2023 |
Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models | Customization | 2023.5 | NeurIPS2024 |
Dragondiffusion: Enabling drag-style manipulation on diffusion models | Image editing | 2023.7 | ICLR2024 |
Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models | Visual signal to image,Image editing | 2023.8 | ARXIV2023 |
Interactdiffusion: Interaction control in text-to-image diffusion models | Layout control | 2023.12 | ARXIV2023 |
Instancediffusion: Instance-level control for image generation | Layout control | 2024.2 | CVPR2024 |
Deadiff: An efficient stylization diffusion model with disentangled representations | Image editing | 2024.3 | CVPR2024 |
Title | Task | Date | Publication |
---|---|---|---|
An image is worth one word: Personalizing text-to-image generation using textual inversion | Customization | 2022.8 | ICLR2023 |
Imagic: Text-based real image editing with diffusion models | Image editing | 2022.10 | CVPR2023 |
Uncovering the disentanglement capability in text-to-image diffusion models | Image editing | 2022.12 | CVPR2023 |
Preditor: Text guided image editing with diffusion prior | Image editing | 2023.2 | ARXIV2023 |
iedit: Localised text-guided image editing with weak supervision | Image editing | 2023.5 | CVPR2024 |
Forgedit: Text guided image editing via learning and forgetting | Image editing | 2023.9 | ARXIV2023 |
Prompting hard or hardly prompting: Prompt inversion for text-to-image diffusion models | Image editing | 2023.12 | CVPR2024 |
We illustrate six conditioning mechanisms with an exemplary image editing process in next figure.
Title | Task | Date | Publication |
---|---|---|---|
Compositional visual generation with composable diffusion models | General approach | 2022.6 | ECCV2022 |
Classifier-free diffusion guidance | / | 2022.7 | ARXIV2022 |
Sine: Single image editing with text-to-image diffusion models | Image editing | 2022.12 | CVPR2023 |
Multidiffusion: Fusing diffusion paths for controlled image generation | Multiple control | 2023.2 | ICML2023 |
Pair-diffusion: Object-level image editing with structure-and-appearance paired diffusion models | Image editing, Image composition | 2023.3 | ARXIV2023 |
Magicfusion: Boosting text-to-image generation performance by fusing diffusion models | Image composition | 2023.3 | ICCV2023 |
Effective real image editing with accelerated iterative diffusion inversion | image editing | 2023.9 | ICCV2023 |
Ledits++: Limitless image editing using text-to-image models | Image editing | 2023.11 | CVPR2024 |
Noisecollage: A layout-aware text-to-image diffusion model based on noise cropping and merging | Image composition | 2024.3 | CVPR2024 |
Title | Task | Date | Publication |
---|---|---|---|
Snips: Solving noisy inverse problems stochastically | Image restoration | 2021.5 | NeurIPS2021 |
Denoising diffusion restoration models | Image restoration | 2022.1 | NeurIPS2022 |
Driftrec: Adapting diffusion models to blind jpeg restoration | Image restoration | 2022.11 | TIP2024 |
Zero-shot image restoration using denoising diffusion null-space model | Image restoration | 2022.12 | ICLR2024 |
Image restoration with mean-reverting stochastic differential equations | Image restoration | 2023.1 | ICML2023 |
Inversion by direct iteration: An alternative to denoising diffusion for image restoration | Image restoration | 2023.3 | TMLR2023 |
Resshift: Efficient diffusion model for image super-resolution by residual shifting | Image restoration | 2023.7 | NeurIPS2024 |
Sinsr: diffusion-based image super-resolution in a single step | Image restoration | 2023.11 | CVPR2024 |