Exploring Warping-Guided Features via Adaptive Latent Diffusion Model for Virtual Try-On (ICME 2024)
ALDM is a novel adaptive latent diffusion model (ALDM) to implement warping-guided feature before generating target images, which contains two modules: 1)prior warping module (PWM) and 2)adaptive alignment module (AAM). Our proposed ALDM is a denoising diffusion model, which srives to generate a target image of human, accurately attired in a reference garment, using a source image (𝐼) and the reference clothing (𝐼𝑟).
Create a conda environment & Install requirments
conda create -n ALDM python==3.9.0
conda activate ALDM
cd ALDM-main # or your path to project dir
pip install -r requirements.yaml
!!! Remember to modify the dataset path and the pre-trained weights path.
Before training, you need to download the VITON-HD or DressCode dataset. Once the datasets are downloaded, the folder structures should look like these:
├── VITON-HD
| ├── train_pairs_unpaired.txt
│ ├── train
| | ├── image
│ │ │ ├── [000006_00.jpg | 000008_00.jpg | ...]
│ │ ├── cloth
│ │ │ ├── [000006_00.jpg | 000008_00.jpg | ...]
│ │ ├── agnostic-mask
│ │ │ ├── [000006_00_mask.png | 000008_00.png | ...]
...
├── DressCode
| ├── trainpairs_paired.txt
│ ├── [dresses | lower_body | upper_body]
| | ├── train_pairs_paired.txt
| | ├── train_pairs_unpaired.txt
│ │ ├── images
│ │ │ ├── [013563_0.jpg | 013563_1.jpg | 013564_0.jpg | 013564_1.jpg | ...]
│ │ ├── agnostic_masks
│ │ │ ├── [013563_0.png| 013564_0.png | ...]
...
To run the training and inference on the DressCode or VITON-HD dataset, run the following command.
train PWM :
$ cd ./PWM
$ sh train.sh
test PWM :
python3 stage1_batchtest_prior_model.py
cd ../ALDM
train ALDM:
$ sh my_train.sh
test ALDM :
$ python3 test.py
Our code is modified based on Diffusers. We adopt Stable Diffusion v2.1 as the base model. In PWM and ALDM, we respectively use OpenCLIP ViT-H/14 and DINOv2-G/14 as the image encoder.
All the materials, including code, checkpoints, and demo, are made available under the Creative Commons BY-NC-SA 4.0 license. You are free to copy, redistribute, remix, transform, and build upon the project for non-commercial purposes, as long as you give appropriate credit and distribute your contributions under the same license.