Skip to content

ottogin/score-distillation-via-inversion

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the official implementation of the paper

    Score Distillation via Reparametrized DDIM

sample generation

Artem Lukoianov 1,   Haitz Sáez de Ocáriz Borde 2,   Kristjan Greenewald 3,   Vitor Campagnolo Guizilini 4,   Timur Bagautdinov 5,   Vincent Sitzmann 1,   Justin Solomon 1

1 Massachusetts Institute of Technology,  2 University of Oxford,  3 MIT-IBM Watson AI Lab, IBM Research,  4 Toyota Research Institute,  5 Meta Reality Labs Research

For any questions please shoot an email to arteml@mit.edu

Prerequisites

For this project we recommend using a UNIX server with CUDA support and a GPU with at least 40GB of VRAM. In the case if the amount of available VRAM is limited, we recommend reducing the rendering resolution by adding the following argument to the running command:

data.width=128 data.height=128

Please note that this will reduce the quality of the generated shapes.

Installation

This project is based on Threestudio. Below is an example of the installation used by the authors for Ubuntu 22.04 and CUDA 12.3:

conda create -n threestudio-sdi python=3.9
conda activate threestudio-sdi

# Consult https://pytorch.org/get-started/locally/ for the latest PyTorch installation instructions
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

pip install ninja
pip install -r requirements.txt

For additional options please address the official installation instructions of Threestudio here to install threestudio.

Running generation

The proccess of generating a shape is similar to the one described in the threestudio documentation. Make sure you are using the SDI config file, like below. Here are a few examples with different prompts:

python launch.py --config configs/sdi.yaml --train --gpu 0 system.prompt_processor.prompt="pumpkin head zombie, skinny, highly detailed, photorealistic"

python launch.py --config configs/sdi.yaml --train --gpu 1 system.prompt_processor.prompt="a photograph of a ninja"

python launch.py --config configs/sdi.yaml --train --gpu 2 system.prompt_processor.prompt="a zoomed out DSLR photo of a hamburger"

python launch.py --config configs/sdi.yaml --train --gpu 3 system.prompt_processor.prompt="bagel filled with cream cheese and lox"

The results will be saved to outputs/score-distillation-via-inversion/.

Export Meshes

To export the scene to texture meshes, use the --export option. Threestudio currently supports exporting to obj+mtl, or obj with vertex colors:

# this uses default mesh-exporter configurations which exports obj+mtl
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter
# specify system.exporter.fmt=obj to get obj with vertex colors
# you may also add system.exporter.save_uv=false to accelerate the process, suitable for a quick peek of the result
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.exporter.fmt=obj
# for NeRF-based methods (DreamFusion, Magic3D coarse, Latent-NeRF, SJC)
# you may need to adjust the isosurface threshold (25 by default) to get satisfying outputs
# decrease the threshold if the extracted model is incomplete, increase if it is extruded
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_threshold=10.
# use marching cubes of higher resolutions to get more detailed models
python launch.py --config path/to/trial/dir/configs/parsed.yaml --export --gpu 0 resume=path/to/trial/dir/ckpts/last.ckpt system.exporter_type=mesh-exporter system.geometry.isosurface_method=mc-cpu system.geometry.isosurface_resolution=256

For all the options you can specify when exporting, see the documentation.

See here for example running commands of all our supported models. Please refer to here for tips on getting higher-quality results, and here for reducing VRAM usage.

Ablations

There are 5 main parameters in system.guidance reproduce the ablation results:

enable_sdi: true # if true - the noise is obtained by running DDIM inversion procvess, if false - noise is sampled randomly as in SDS
inversion_guidance_scale: -7.5 # guidance scale for DDIM inversion process
inversion_n_steps: 10 # number of steps in the inversion process
inversion_eta: 0.3 # random noise added to in the end of the inversion process
t_anneal: true # if true - timestep t is annealed from 0.98 to 0.2 instead of sampled from U[0.2, 0.98] like in SDS

2D Generation

There are 2 main methods that allow to perform score distialltion in 2D with the insights from our paper. The first one, as in 3D case, is inferring noise with DDIM inversion. Absence of other views, however, allows us to use a second method - caching of $\kappa$, which is also provided in the notebook. Please conuslt 2dplayground_SDI_version.ipynb for more details.

Citing

If you find our project useful, please consider citing it:

@misc{lukoianov2024score,
    title={Score Distillation via Reparametrized DDIM}, 
    author={Artem Lukoianov and Haitz Sáez de Ocáriz Borde and Kristjan Greenewald and Vitor Campagnolo Guizilini and Timur Bagautdinov and Vincent Sitzmann and Justin Solomon},
    year={2024},
    eprint={2405.15891},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

About

Text to 3D generation via score distillation and DDIM inversion

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 72.0%
  • Python 27.7%
  • Other 0.3%