Task: Image Generation
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128x128, 4.59 on ImageNet 256x256, and 7.72 on ImageNet 512x512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256x256 and 3.85 on ImageNet 512x512.
ImageNet
Model | Dataset | Scheduler | Steps | CGS | Time Consuming(A100) | FID-Full-50K | Download |
---|---|---|---|---|---|---|---|
adm_ddim250_8xb32_imagenet-64x64 | ImageNet 64x64 | DDIM | 250 | - | 1h | 3.2284 | ckpt |
adm-g_ddim25_8xb32_imagenet-64x64 | ImageNet 64x64 | DDIM | 25 | 1.0 | 2h | 3.7566 | ckpt |
adm_ddim250_8xb32_imagenet-256x256 | ImageNet 256x256 | DDIM | 250 | - | - | - | ckpt |
adm-g_ddim25_8xb32_imagenet-256x256 | ImageNet 256x256 | DDIM | 25 | 1.0 | - | - | ckpt |
adm_ddim250_8xb32_imagenet-512x512 | ImageNet 512x512 | DDIM | 250 | - | - | - | ckpt |
adm-g_ddim25_8xb32_imagenet-512x512 | ImageNet 512x512 | DDIM | 25 | 1.0 | - | - | ckpt |
infer
Infer Instructions
You can run adm as follows:
from mmengine import Config, MODELS
from mmengine.registry import init_default_scope
from torchvision.utils import save_image
init_default_scope('mmagic')
# sampling without classifier guidance, CGS=1.0
config = 'configs/guided_diffusion/adm-g_ddim25_8xb32_imagenet-64x64.py'
ckpt_path = 'https://download.openmmlab.com/mmediting/guided_diffusion/adm-g_8xb32_imagenet-64x64-2c0fbeda.pth' # noqa
model_cfg = Config.fromfile(config).model
model_cfg.pretrained_cfgs = dict(unet=dict(ckpt_path=ckpt_path, prefix='unet'),
classifier=dict(ckpt_path=ckpt_path, prefix='classifier'))
model = MODELS.build(model_cfg).cuda().eval()
samples = model.infer(
init_image=None,
batch_size=4,
num_inference_steps=25,
labels=333,
classifier_scale=1.0,
show_progress=True)['samples']
# sampling without classifier guidance
config = 'configs/guided_diffusion/adm_ddim250_8xb32_imagenet-64x64.py'
ckpt_path = 'https://download.openmmlab.com/mmediting/guided_diffusion/adm-u-cvt-rgb_8xb32_imagenet-64x64-7ff0080b.pth' # noqa
model_cfg = Config.fromfile(config).model
model_cfg.pretrained_cfgs = dict(unet=dict(ckpt_path=ckpt_path, prefix='unet'))
model = MODELS.build(model_cfg).cuda().eval()
samples = model.infer(
init_image=None,
batch_size=4,
num_inference_steps=250,
labels=None,
classifier_scale=0.0,
show_progress=True)['samples']
Test
Test Instructions
You can use the following commands to test a model with cpu or single/multiple GPUs.
# cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/guided_diffusion/adm-u_ddim250_8xb32_imagenet-64x64.py https://download.openmmlab.com/mmgen/guided_diffusion/adm-u-cvt-rgb_8xb32_imagenet-64x64-7ff0080b.pth
# single-gpu test
python tools/test.py configs/guided_diffusion/adm-u_ddim250_8xb32_imagenet-64x64.py https://download.openmmlab.com/mmgen/guided_diffusion/adm-u-cvt-rgb_8xb32_imagenet-64x64-7ff0080b.pth
# multi-gpu test
./tools/dist_test.sh configs/guided_diffusion/adm-u_ddim250_8xb32_imagenet-64x64.py https://download.openmmlab.com/mmgen/guided_diffusion/adm-u-cvt-rgb_8xb32_imagenet-64x64-7ff0080b.pth 8
For more details, you can refer to Test a pre-trained model part in train_test.md.
@article{PrafullaDhariwal2021DiffusionMB,
title={Diffusion Models Beat GANs on Image Synthesis},
author={Prafulla Dhariwal and Alex Nichol},
journal={arXiv: Learning},
year={2021}
}