Skip to content
/ Carver Public

[NeurIPS'23] An efficient PyTorch-based library for training 3D-aware image synthesis models.

License

Notifications You must be signed in to change notification settings

qiuyu96/Carver

Repository files navigation

Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase

timeline.jpg Figure: Overview of our modularized pipeline for 3D-aware image synthesis, which modularizes the generation process in a universal way. Each module can be improved independently, facilitating algorithm development. Note that the discriminator is omitted for simplicity.

Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase
Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Sida Peng, Yujun Shen
NeurIPS 2023 Datasets and Benchmarks Track

[Paper]

Overview of methods supported by our codebase:

Supported Methods (9)
Supported Modules (8)
  • Deterministic Pose Sampling, Uncertainty Pose Sampling, ...
  • Uniform, Normal, Fixed, ...
  • Tri-plane, Volume, MLP, MPI, ...
  • LeakyReLU, Softplus, SIREN, ReLU, ...
  • Occupancy, Density, Feature, Color, SDF, ...
  • color, geometry, ...
  • FID, ID, RE, DE, PE, ...

Installation

Our code is tested with Python 3.8, CUDA 11.3, and PyTorch 1.11.0.

  1. Install package requirements via conda:

    conda create -n <ENV_NAME> python=3.8  # create virtual environment with Python 3.8
    conda activate <ENV_NAME>
    conda install pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch # install PyTorch 1.11.0
    pip install -r requirements.txt # install dependencies
  2. Our code requires nvdiffrast, so please refer to the documentation for instructions on how to install it.

  3. Our code also uses the face reconstruction model to evaluate metrics. Please refer to this guide to prepare prerequisite models.

  4. To use a video visualizer (optional), please also install ffmpeg.

    • Ubuntu: sudo apt-get install ffmpeg.
    • MacOS: brew install ffmpeg.
  5. To reduce memory footprint (optional), you can switch to either jemalloc (recommended) or tcmalloc rather than your default memory allocator.

    • jemalloc (recommended):
      • Ubuntu: sudo apt-get install libjemalloc
    • tcmalloc:
      • Ubuntu: sudo apt-get install google-perftools

Preparing datasets

FFHQ and ShapeNet Cars: Please refer to this guide to prepare the datasets.

Cats: Please refer to this guide to prepare the dataset.

Quick demo

Train EG3D on FFHQ in Resolution of 515x512

In your Terminal, run:

./scripts/training_demos/eg3d_ffhq512.sh <NUM_GPUS> <PATH_TO_DATA> [OPTIONS]

where

  • <NUM_GPUS> refers to the number of GPUs. Setting <NUM_GPUS> as 1 helps launch a training job on single-GPU platforms.

  • <PATH_TO_DATA> refers to the path of FFHQ dataset (in a resolution of 256x256) with zip format. If running on local machines, a soft link of the data will be created under the data folder of the working directory to save disk space.

  • [OPTIONS] refers to any additional option to pass. Detailed instructions on available options can be shown via ./scripts/training_demos/eg3d_ffhq512.sh <NUM_GPUS> <PATH_TO_DATA> --help.

This demo script uses eg3d_ffhq512 as the default value of job_name, which is particularly used to identify experiments. Concretely, a directory with the name job_name will be created under the root working directory (which is set as work_dirs/ by default). To prevent overwriting previous experiments, an exception will be raised to interrupt the training if the job_name directory has already existed. To change the job name, please use --job_name=<NEW_JOB_NAME> option.

Other 3D GAN models reproduced by our codebase can be trained similarly, please refer to scripts under ./scripts/training_demos/ for more details.

Ablation point embedder using our codebase.

To investigate the effect of various point embedders, one can utilize the following command to train the models.

MLP-based

./scripts/training_demos/ablation3d.sh <NUM_GPUS> <PATH_TO_DATA> --job_name <YOUR_JOB_NAME> --root_work_dir <YOUR_ROOT_DIR> --ref_mode 'coordinate' --use_positional_encoding false --mlp_type 'stylenerf' --mlp_depth 16 --mlp_hidden_dim 128 --mlp_output_dim 64 --r1_gamma 1.5

Volume-based

./scripts/training_demos/ablation3d.sh <NUM_GPUS> <PATH_TO_DATA> --job_name <YOUR_JOB_NAME> --root_work_dir <YOUR_ROOT_DIR> --ref_mode 'volume' --fv_feat_res 64 --use_positional_encoding false --mlp_type 'stylenerf' --mlp_depth 16 --mlp_hidden_dim 128 --mlp_output_dim 64 --r1_gamma 1.5

Tri-plane-based

./scripts/training_demos/ablation3d.sh <NUM_GPUS> <PATH_TO_DATA> --job_name <YOUR_JOB_NAME> --root_work_dir <YOUR_ROOT_DIR> --ref_mode 'triplane' --fv_feat_res 64 --use_positional_encoding false --mlp_type 'eg3d' --mlp_depth 2 --mlp_hidden_dim 64 --mlp_output_dim 32 --r1_gamma 1.5

Inspect training results

Besides using TensorBoard to track the training process, the raw results (e.g., training losses and running time) are saved in JSON Lines format. They can be easily inspected with the following script

import json

file_name = '<PATH_TO_WORK_DIR>/log.json'

data_entries = []
with open(file_name, 'r') as f:
    for line in f:
        data_entry = json.loads(line)
        data_entries.append(data_entry)

# An example of data entry
# {"Loss/D Fake": 0.4833524551040682, "Loss/D Real": 0.4966000154727226, "Loss/G": 1.1439273656869773, "Learning Rate/Discriminator": 0.002352941082790494, "Learning Rate/Generator": 0.0020000000949949026, "data time": 0.0036810599267482758, "iter time": 0.24490128830075264, "run time": 66108.140625}

Inference for visualization

After training a model, one can employ the following scripts to run inference and visualize the results, including images, videos, and geometries.

CUDA_VISIBLE_DEVICES=0 python test_3d_inference.py --model <PATH_TO_MODEL> --work_dir <PATH_TO_WORK_DIR> --save_image true --save_video false --save_shape true --shape_res 512 --num 10 --truncation_psi 0.7

Evaluate metrics

After training a model, one can use the following scripts to evaluate various metrics, including FID, face identity consistency (ID), depth error (DE), pose error (PE) and reprojection error (RE).

python -m torch.distributed.launch --nproc_per_node=1 test_3d_metrics.py --dataset <PATH_TO_DATA> --model <PATH_TO_MODEL>  --test_fid true --align_face true --test_identity true --test_reprojection_error true --test_pose true --test_depth true --fake_num 1000

TODO

  • Upload pretrained checkpoints
  • User Guide

Acknowledgement

This repository is built upon Hammer, on top of which we reimplement GRAF, GIRAFFE, π-GAN, StyleSDF, StyleNeRF, VolumeGAN, GRAM, EpiGRAF and EG3D.

BibTeX

@article{wang2023benchmarking,
  title   = {Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase},
  author  = {Wang, Qiuyu and Shi, Zifan and Zheng, Kecheng and Xu, Yinghao and Peng, Sida and Shen, Yujun},
  journal = {arXiv preprint arXiv:2306.12423},
  year    = {2023}
}

About

[NeurIPS'23] An efficient PyTorch-based library for training 3D-aware image synthesis models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published