Skip to content

This resposity maintains a collection of important papers on conditional image synthesis with diffusion models

License

Notifications You must be signed in to change notification settings

zju-pi/Awesome-Conditional-Diffusion-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A Survey on Conditional Image Synthesis with Diffusion Models

Awesome License: MIT visitors

The repository is based on our recently released survey Conditional Image Synthesis with Diffusion Models: A Survey

Zheyuan Zhan, Defang Chen, Jian-Ping Mei, Zhenghe Zhao, Jiawei Chen, Chun Chen, Siwei Lyu, Fellow, IEEE and Can Wang

Abstract

Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches in the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the essential sampling process. All discussions are centered around popular applications. Finally, we pinpoint some critical yet still open problems to be solved in the future and suggest some possible solutions.

News!

📆2024-10-05: Our comprehensive survey paper, summarizing related methods published before October 1, 2024, is now available.

BibTeX

@article{zhan2024conditional,
  title={Conditional Image Synthesis with Diffusion Models: A Survey},
  author={Zhan, Zheyuan and Chen, Defang and Mei, Jian-Ping and Zhao, Zhenghe and Chen, Jiawei and Chen, Chun and Lyu, Siwei and Wang, Can},
  journal={arXiv preprint arXiv:2409.19365},
  year={2024}
}

Contents

Overview

In the two figures below, they respectively illustrate the DCIS taxonomy in this survey and the categorization of conditional image synthesis tasks.

Paper Structure

Conditional image synthesis with diffusion model

Conditional image synthesis tasks

tasks

Papers

The date in the table represents the publication date of the first version of the paper on Arxiv.

Condition Integration in Denoising Networks

This figure provides an examplar workflow to build desired denoising network for conditional synthesis tasks including text-to-image, visual signals to image and customization via these three condition integration stages.

Workflow

Condition Integration in the Training Stage

Conditional models for text-to-image (T2I)

Title Task Date Publication
Vector quantized diffusion model for text-to-image synthesis Text-to-image 2021.11 CVPR2022
High-resolution image synthesis with latent diffusion models Text-to-image 2021.12 CVPR2022
GLIDE: towards photorealistic image generation and editing with text-guided diffusion models Text-to-image 2021.12 ICML2022
Hierarchical text-conditional image generation with CLIP latents Text-to-image 2022.4 ARXIV2022
Photorealistic text-to-image diffusion models with deep language understanding Text-to-image 2022.5 NeurIPS2022
ediffi: Text-to-image diffusion models with an ensemble of expert denoisers Text-to-image 2022.11 ARXIV2022

Conditional Models for Image Restoration

Title Task Date Publication
Srdiff: Single image super-resolution with diffusion probabilistic models Image restoration 2021.4 Neurocomputing2022
Image super-resolution via iterative refinement Image restoration 2021.4 TPAMI2022
Cascaded diffusion models for high fidelity image generation Image restoration 2021.5 JMLR2022
Palette: Image-to-image diffusion models Image restoration 2021.11 SIGGRAPH2022
Denoising diffusion probabilistic models for robust image super-resolution in the wild Image restoration 2023.2 ARXIV2023
Resdiff: Combining cnn and diffusion model for image super-resolution Image restoration 2023.3 AAAI2024
Low-light image enhancement with wavelet-based diffusion models Image restoration 2023.6 TOG2023
Wavelet-based fourier information interaction with frequency diffusion adjustment for underwater image restoration Image restoration 2023.11 CVPR2024
Diffusion-based blind text image super-resolution Image restoration 2023.12 CVPR2023
Low-light image enhancement via clip-fourier guided wavelet diffusion Image restoration 2024.1 ARXIV2024

Conditional Models for Other Synthesis Scenarios

Title Task Date Publication
Diffusion autoencoders: Toward a meaningful and decodable representation Novel conditional control 2021.11 CVPR2022
Semantic image synthesis via diffusion models visual feature map 2022.6 ARXIV2022
A novel unified conditional scorebased generative framework for multi-modal medical image completion Medical image synthesis 2022.7 ARXIV2022
A morphology focused diffusion probabilistic model for synthesis of histopathology images Medical image synthesis 2022.9 WACV2023
Humandiffusion: a coarse-to-fine alignment diffusion framework for controllable text-driven person image generation Visual signal to image 2022.11 ARXIV2022
Diffusion-based scene graph to image generation with masked contrastive pre-training Graph to image 2022.11 ARXIV2022
Dolce: A model-based probabilistic diffusion framework for limited-angle ct reconstruction Medical image synthesis 2022.11 ICCV2023
Zero-shot medical image translation via frequency-guided diffusion models Image editing 2023.4 Trans. Med. Imaging 2023
Learned representation-guided diffusion models for large-image generation / 2023.12 ARXIV2023

Condition Integration in the Re-purposing Stage

Re-purposed Conditional Encoders

Title Task Date Publication
Pretraining is all you need for image-to-image translation Visual signal to image 2022.5 ARXIV2022
T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models Visual signal to image 2023.2 AAAI2024
Adding conditional control to text-to-image diffusion models Visual signal to image 2023.2 ICCV2023
Encoder-based domain tuning for fast personalization of text-to-image models Customization 2023.2 TOG2023
Pair-diffusion: Object-level image editing with structure-and-appearance paired diffusion models Image editing, Image composition 2023.3 ARXIV2023
Taming encoder for zero fine-tuning image customization with text-to-image diffusion models Customization 2023.4 ARXIV2023
Instantbooth: Personalized text-to-image generation without test-time finetuning Customization 2023.4 CVPR2024
Blip-diffusion: pre-trained subject representation for controllable text-to-image generation and editing Customization 2023.5 NeurIPS2023
Fastcomposer: Tuning-free multi-subject image generation with localized attention Customization 2023.5 ARXIV2023
Prompt-free diffusion: Taking” text” out of text-to-image diffusion models Visual signal to image 2023.5 CVPR2024
Paste,inpaint and harmonize via denoising: Subject-driven image editing with pre-trained diffusion model Image composition 2023.6 ARXIV2023
Subject-diffusion: Open domain personalized text-to-image generation without test-time fine-tuning Customization,Layout control 2023.7 SIGGRAPH2024
Imagebrush: Learning visual in-context instructions for exemplar-based image manipulation Image editing 2023.8 NeurIPS2024
Guiding instruction-based image editing via multimodal large language models Image editing 2023.9 ARXIV2023
Ranni: Taming text-to-image diffusion for accurate instruction following Image editing 2023.11 ARXIV2023
Smartedit: Exploring complex instruction-based image editing with multimodal large language models Image editing 2023.12 ARXIV2023
Instructany2pix: Flexible visual editing via multimodal instruction following Image editing 2023.12 ARXIV2023
Warpdiffusion: Efficient diffusion model for high-fidelity virtual try-on Image composition 2023.12 ARXIV2023
Coarse-to-fine latent diffusion for pose-guided person image synthesis Customization 2024.2 CVPR2024
Lightit: Illumination modeling and control for diffusion models Visual signal to image 2024.3 CVPR2024
Face2diffusion for fast and editable face personalization Customization 2024.3 CVPR2024

Condition Injection

Title Task Date Publication
GLIGEN: open-set grounded text-to-image generation Layout control 2023.1 CVPR2023
Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation Customization 2023.2 CVPR2023
Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models Customization 2023.5 NeurIPS2024
Dragondiffusion: Enabling drag-style manipulation on diffusion models Image editing 2023.7 ICLR2024
Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models Visual signal to image,Image editing 2023.8 ARXIV2023
Interactdiffusion: Interaction control in text-to-image diffusion models Layout control 2023.12 ARXIV2023
Instancediffusion: Instance-level control for image generation Layout control 2024.2 CVPR2024
Deadiff: An efficient stylization diffusion model with disentangled representations Image editing 2024.3 CVPR2024

Backbone Fine-tuning

Title Date Publication
Instructpix2pix: Learning to follow image editing instructions Image editing 2022.11 CVPR2023
Paint by example: Exemplar-based image editing with diffusion models Image composition 2022.11 CVPR2023
Objectstitch: Object compositing with diffusion model Image composition 2022.12 CVPR2023
Smartbrush: Text and shape guided object inpainting with diffusion model Image restoration 2022.12 CVPR2023
Imagen editor and editbench: Advancing and evaluating text-guided image inpainting Image restoration 2022.12 CVPR2023
Reference-based image composition with sketch via structure-aware diffusion model Image composition 2023.3 ARXIV2023
Dialogpaint: A dialogbased image editing model Image editing 2023.3 ARXIV2023
Hive: Harnessing human feedback for instructional visual editing Image editing 2023.3 CVPR2024
Inst-inpaint: Instructing to remove objects with diffusion models Image editing 2023.4 ARXIV2023
Text-to-image editing by image information removal Image editing 2023.5 WACV2024
Magicbrush: A manually annotated dataset for instruction-guided image editing Image editing 2023.6 NeurIPS2024
Anydoor: Zero-shot object-level image customization Image composition 2023.7 CVPR2024
Instructdiffusion: A generalist modeling interface for vision tasks Image editing 2023.9 ARXIV2023
Emu edit: Precise image editing via recognition and generation tasks Image editing 2023.11 CVPR2024
Dreaminpainter: Text-guided subject-driven image inpainting with diffusion models Image composition 2023.12 ARXIV2023

Condition Integration in the Specialization Stage

Conditional Projection

Title Task Date Publication
An image is worth one word: Personalizing text-to-image generation using textual inversion Customization 2022.8 ICLR2023
Imagic: Text-based real image editing with diffusion models Image editing 2022.10 CVPR2023
Uncovering the disentanglement capability in text-to-image diffusion models Image editing 2022.12 CVPR2023
Preditor: Text guided image editing with diffusion prior Image editing 2023.2 ARXIV2023
iedit: Localised text-guided image editing with weak supervision Image editing 2023.5 CVPR2024
Forgedit: Text guided image editing via learning and forgetting Image editing 2023.9 ARXIV2023
Prompting hard or hardly prompting: Prompt inversion for text-to-image diffusion models Image editing 2023.12 CVPR2024

Testing-time Model Fine-Tuning

Title task Date Publication
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation Customization 2022.8 CVPR2023
Imagic: Text-based real image editing with diffusion models Image editing 2022.10 CVPR2023
Unitune: Text-driven image editing by fine tuning a diffusion model on a single image Image editing 2022.10 TOG2023
Multi-concept customization of text-to-image diffusion Customization 2022.12 CVPR2023
Sine: Single image editing with text-to-image diffusion models Image editing 2022.12 CVPR2023
Encoder-based domain tuning for fast personalization of text-to-image models Customization 2023.2 TOG2023
Svdiff: Compact parameter space for diffusion fine-tuning Customization 2023.3 ICCV2023
Cones: concept neurons in diffusion models for customized generation Customization 2023.3 ICML2023
Custom-edit: Text-guided image editing with customized diffusion models Customization 2023.5 ARXIV2023
Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models Customization 2023.5 NeurIPS2024
Layerdiffusion: Layered controlled image editing with diffusion models Image editing 2023.5 SIGGRAPH Asia2023
Cones 2: Customizable image synthesis with multiple subjects Customization 2023.5 NeurIPS2023

Condition Integration in the Sampling Process

We illustrate six conditioning mechanisms with an exemplary image editing process in next figure.

Sampling

Inversion

Title Task Date Publication
Sdedit: Guided image synthesis and editing with stochastic differential equations Image editing, Visual signal to image 2021.8 ICLR2022
Dual diffusion implicit bridges for image-to-image translation Image editing, Visual signal to image 2022.3 ICLR2023
Null-text inversion for editing real images using guided diffusion models Image editing 2022.11 CVPR2023
Edict: Exact diffusion inversion via coupled transformations Image editing 2022.11 CVPR2023
A latent space of stochastic diffusion models for zero-shot image editing and guidance Image editing 2022.11 ICCV2023
Inversion-based style transfer with diffusion models Image editing 2022.11 CVPR2023
An edit friendly ddpm noise space: Inversion and manipulations Image editing 2023.4 ARXIV2023
Prompt tuning inversion for text-driven image editing using diffusion models Image editing 2023.5 ICCV2023
Negative-prompt inversion: Fast image inversion for editing with textguided diffusion models Image editing 2023.5 ARXIV2023
Dragdiffusion: Harnessing diffusion models for interactive point-based image editing Image editing 2023.6 CVPR2024
Tf-icon: Diffusion-based training-free cross-domain image composition Image editing 2023.7 ICCV2023
Stylediffusion: Controllable disentangled style transfer via diffusion models Image editing 2023.8 ICCV2023
Kv inversion: Kv embeddings learning for text-conditioned real image action editing Image editing 2023.9 PRCV2023
Effective real image editing with accelerated iterative diffusion inversion Image editing 2023.9 ICCV2023
Direct inversion: Boosting diffusion-based editing with 3 lines of code Image editing 2023.10 ARXIV2023
Ledits++: Limitless image editing using text-to-image models Image editing 2023.11 CVPR2024
The blessing of randomness: Sde beats ode in general diffusionbased image editing Image editing 2023.11 ICLR2023
Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer Image editing 2023.12 CVPR2024
Fixed-point inversion for text-to-image diffusion models Image editing 2023.12 ARXIV2023

Attention Manipulation

Title Task Date Publication
Prompt-to-prompt image editing with cross attention control Image editing 2022.8 ICLR2023
Plug-and-play diffusion features for text-driven image-to-image translation Image editing 2022.11 CVPR2023
ediffi: Text-toimage diffusion models with an ensemble of expert denoisers Layout control 2022.11 ARXIV2022
Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing Image editing 2023.4 ICCV2023
Custom-edit: Text-guided image editing with customized diffusion models Customization 2023.5 ARXIV2023
Cones 2: Customizable image synthesis with multiple subjects Customization 2023.5 NeurIPS2023
Dragdiffusion: Harnessing diffusion models for interactive point-based image editing Image editing 2023.6 CVPR2024
Tf-icon: Diffusion-based training-free cross-domain image composition Image editing 2023.7 ICCV2023
Dragondiffusion: Enabling drag-style manipulation on diffusion models Image editing 2023.7 ICLR2024
Stylediffusion: Controllable disentangled style transfer via diffusion models Image editing 2023.8 ICCV2023
Face aging via diffusion-based editing Image editing 2023.9 BMVC2023
Dynamic prompt learning: Addressing cross-attention leakage for text-based image editing Image editing 2023.9 NeurIPS2024
Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer Image editing 2023.12 CVPR2024
Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation Image editing 2023.12 ARXIV2023
Towards understanding cross and self-attention in stable diffusion for text-guided image editing Image editing 2024.3 CVPR2024

Noise Blending

Title Task Date Publication
Compositional visual generation with composable diffusion models General approach 2022.6 ECCV2022
Classifier-free diffusion guidance / 2022.7 ARXIV2022
Sine: Single image editing with text-to-image diffusion models Image editing 2022.12 CVPR2023
Multidiffusion: Fusing diffusion paths for controlled image generation Multiple control 2023.2 ICML2023
Pair-diffusion: Object-level image editing with structure-and-appearance paired diffusion models Image editing, Image composition 2023.3 ARXIV2023
Magicfusion: Boosting text-to-image generation performance by fusing diffusion models Image composition 2023.3 ICCV2023
Effective real image editing with accelerated iterative diffusion inversion image editing 2023.9 ICCV2023
Ledits++: Limitless image editing using text-to-image models Image editing 2023.11 CVPR2024
Noisecollage: A layout-aware text-to-image diffusion model based on noise cropping and merging Image composition 2024.3 CVPR2024

Revising Diffusion Process

Title Task Date Publication
Snips: Solving noisy inverse problems stochastically Image restoration 2021.5 NeurIPS2021
Denoising diffusion restoration models Image restoration 2022.1 NeurIPS2022
Driftrec: Adapting diffusion models to blind jpeg restoration Image restoration 2022.11 TIP2024
Zero-shot image restoration using denoising diffusion null-space model Image restoration 2022.12 ICLR2024
Image restoration with mean-reverting stochastic differential equations Image restoration 2023.1 ICML2023
Inversion by direct iteration: An alternative to denoising diffusion for image restoration Image restoration 2023.3 TMLR2023
Resshift: Efficient diffusion model for image super-resolution by residual shifting Image restoration 2023.7 NeurIPS2024
Sinsr: diffusion-based image super-resolution in a single step Image restoration 2023.11 CVPR2024

Guidance

Title Task Date Publication
Diffusion models beat gans on image synthesis Text-to-image 2021.5 NeurIPS2021
Blended diffusion for text-driven editing of natural images Image restoration 2021.11 CVPR2022
More control for free! image synthesis with semantic diffusion guidance Text/Image-to-image 2021.12 WACV2023
Improving diffusion models for inverse problems using manifold constraints Image restoration 2022.6 NeurIPS2022
Diffusion posterior sampling for general noisy inverse problems Image restoration 2022.9 ICLR2023
Diffusion-based image translation using disentangled style and content representation Image editing 2022.9 ICLR2023
Sketch-guided text-to-image diffusion models Visual signal to image 2022.11 SIGGRAPH2023
High-fidelity guided image synthesis with latent diffusion models Visual signal to image 2022.11 CVPR2023
Parallel diffusion models of operator and image for blind inverse problems Image restoration 2022.11 CVPR2023
Zero-shot image-to-image translation Image editing 2023.2 SIGGRAPH2023
Universal guidance for diffusion models General guidance framework 2023.2 CVPR2023
**Pseudoinverse-guided diffusion models for inverse problems ** Image restoration 2023.2 ICLR2023
Freedom: Training-free energy-guided conditional diffusion model General guidance framework 2023.3 ICCV2023
Training-free layout control with cross-attention guidance Layout control 2023.4 WACV2024
Generative diffusion prior for unified image restoration and enhancement Image restoration 2023.4 CVPR2023
Regeneration learning of diffusion models with rich prompts for zero-shot image translation Image editing 2023.5 ARXIV2023
Diffusion self-guidance for controllable image generation Image editing 2023.6 NeurIPS2024
Energy-based cross attention for bayesian context update in text-to-image diffusion models Image editing 2023.6 NeurIPS2024
Solving linear inverse problems provably via posterior sampling with latent diffusion models Image restoration 2023.7 NeurIPS2024
Dragondiffusion: Enabling drag-style manipulation on diffusion models Image editing 2023.7 ICLR2024
Readout guidance: Learning control from diffusion features Visual signal to image 2023.12 CVPR2024
Freecontrol: Training-free spatial control of any text-to-image diffusion model with any condition Visual signal to image 2023.12 CVPR2024
Diffeditor: Boosting accuracy and flexibility on diffusion-based image editing Image editing 2024.2 CVPR2024

Conditional Correction

Title Task Date Publication
Score-based generative modeling through stochastic differential equations Image restoration 2020.11 ICLR2021
ILVR: conditioning method for denoising diffusion probabilistic models Image restoration 2021.8 ICCV2021
Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction Image restoration 2021.12 CVPR2022
Repaint: Inpainting using denoising diffusion probabilistic models Image restoration 2022.1 CVPR2022
Improving diffusion models for inverse problems using manifold constraints Image restoration 2022.6 NeurIPS2022
Diffedit: Diffusion-based semantic image editing with mask guidance Image editing 2022.10 ICLR2023
Region-aware diffusion for zero-shot text-driven image editing Image editing 2023.2 ARXIV2023
Localizing object-level shape variations with text-to-image diffusion models Image editing 2023.3 ICCV2023
Instructedit: Improving automatic masks for diffusion-based image editing with user instructions Image editing 2023.5 ARXIV2023
Text-driven image editing via learnable regions Image editing 2023.11 CVPR2024

About

This resposity maintains a collection of important papers on conditional image synthesis with diffusion models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published