This is the official implementation of the paper
Artem Lukoianov 1, Haitz Sáez de Ocáriz Borde 2, Kristjan Greenewald 3, Vitor Campagnolo Guizilini 4, Timur Bagautdinov 5, Vincent Sitzmann 1, Justin Solomon 1
1 Massachusetts Institute of Technology, 2 University of Oxford, 3 MIT-IBM Watson AI Lab, IBM Research, 4 Toyota Research Institute, 5 Meta Reality Labs Research
For any questions please shoot an email to arteml@mit.edu
For this project we recommend using a UNIX server with CUDA support and a GPU with at least 40GB of VRAM. In the case if the amount of available VRAM is limited, we recommend reducing the rendering resolution by adding the following argument to the running command:
data.width=128 data.height=128
Please note that this will reduce the quality of the generated shapes.
This project is based on Threestudio. Below is an example of the installation used by the authors for Ubuntu 22.04 and CUDA 12.3:
conda create -n threestudio-sdi python=3.9
conda activate threestudio-sdi
# Consult https://pytorch.org/get-started/locally/ for the latest PyTorch installation instructions
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install ninja
pip install -r requirements.txt
For additional options please address the official installation instructions of Threestudio here to install threestudio.
The proccess of generating a shape is similar to the one described in the threestudio documentation. Make sure you are using the SDI config file, like below. Here are a few examples with different prompts:
python launch.py --config configs/sdi.yaml --train --gpu 0 system.prompt_processor.prompt="pumpkin head zombie, skinny, highly detailed, photorealistic"
python launch.py --config configs/sdi.yaml --train --gpu 1 system.prompt_processor.prompt="a photograph of a ninja"
python launch.py --config configs/sdi.yaml --train --gpu 2 system.prompt_processor.prompt="a zoomed out DSLR photo of a hamburger"
python launch.py --config configs/sdi.yaml --train --gpu 3 system.prompt_processor.prompt="bagel filled with cream cheese and lox"
The results will be saved to outputs/score-distillation-via-inversion/
.
To export the scene to texture meshes, use the --export
option. Threestudio currently supports exporting to obj+mtl, or obj with vertex colors:
# this uses default mesh-exporter configurations which exports obj+mtl
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter
# specify system.exporter.fmt=obj to get obj with vertex colors
# you may also add system.exporter.save_uv=false to accelerate the process, suitable for a quick peek of the result
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.exporter.fmt=obj
# for NeRF-based methods (DreamFusion, Magic3D coarse, Latent-NeRF, SJC)
# you may need to adjust the isosurface threshold (25 by default) to get satisfying outputs
# decrease the threshold if the extracted model is incomplete, increase if it is extruded
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_threshold=10.
# use marching cubes of higher resolutions to get more detailed models
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_method=mc-cpu system.geometry.isosurface_resolution=256
For all the options you can specify when exporting, see the documentation.
See here for example running commands of all our supported models. Please refer to here for tips on getting higher-quality results, and here for reducing VRAM usage.
There are 5 main parameters in system.guidance
reproduce the ablation results:
enable_sdi: true # if true - the noise is obtained by running DDIM inversion procvess, if false - noise is sampled randomly as in SDS
inversion_guidance_scale: -7.5 # guidance scale for DDIM inversion process
inversion_n_steps: 10 # number of steps in the inversion process
inversion_eta: 0.3 # random noise added to in the end of the inversion process
t_anneal: true # if true - timestep t is annealed from 0.98 to 0.2 instead of sampled from U[0.2, 0.98] like in SDS
There are 2 main methods that allow to perform score distialltion in 2D with the insights from our paper.
The first one, as in 3D case, is inferring noise with DDIM inversion.
Absence of other views, however, allows us to use a second method - caching of 2dplayground_SDI_version.ipynb
for more details.
If you find our project useful, please consider citing it:
@misc{lukoianov2024score,
title={Score Distillation via Reparametrized DDIM},
author={Artem Lukoianov and Haitz Sáez de Ocáriz Borde and Kristjan Greenewald and Vitor Campagnolo Guizilini and Timur Bagautdinov and Vincent Sitzmann and Justin Solomon},
year={2024},
eprint={2405.15891},
archivePrefix={arXiv},
primaryClass={cs.CV}
}