Skip to content

DerrickWang005/LaVin-DiT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LaVin-DiT Logo

LaVin-DiT: Large Vision Diffusion Transformer

Zhaoqing Wang  ·  Xiaobo Xia  ·  Runnan Chen  ·  Dongdong Yu
Changhu Wang  ·  Mingming Gong  ·  Tongliang Liu

Demo 1 Demo 2
  • 2025/1/27 Update: Release inference code and checkpoints!
  • 2024/11/24 Update: Add project homepage, LaVin-DiT!

Installation

To get started, clone this project, create a conda virtual environment using Python 3.10+, and install the requirements:

# clone the repo
git clone https://github.com/DerrickWang005/LaVin-DiT.git
cd LaVin-DiT

# create conda environment
conda create -n lavin python=3.10
conda activate lavin
conda install pytorch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

# apex
git clone https://github.com/NVIDIA/apex
cd apex
# if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... 
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# otherwise
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./

# flash attention
pip install packaging ninja
pip install flash-attn --no-build-isolation

Inference

  • Before running inference, please install git-lfs to download the pretrained models and template prompts.
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
  • extract the pretrained models and template prompts
git clone https://huggingface.co/DerrickWang005/LaVin-DiT
mv template.tar.gz weights.tar.gz ../
rm -rf LaVin-DiT
tar xzf template.tar.gz
tar xzf weights.tar.gz
  • we provide some test examples under test_sample/. You can run the following command to process multiple vision tasks, e.g., depth estimation, surface normal estimation, etc:
export PYTHONPATH="."
export VAE_CKPT="weights/stvae.pt"
export DIT_CKPT="weights/lavin_dit.safetensors"
export QUERY_IMAGE="test_sample/hinton.png"
export OUTPUT="hinton.png"


# depth estimation
python inference.py \
    --query $QUERY_IMAGE \
    --output result_depth_$OUTPUT \
    --task_dir template/depth_estimation \
    --vae_path $VAE_CKPT \
    --dit_path $DIT_CKPT \
    --height 512 \
    --width 512

# surface normal estimation
python inference.py \
    --query $QUERY_IMAGE \
    --output result_normal_$OUTPUT \
    --task_dir template/normal_estimation \
    --vae_path $VAE_CKPT \
    --dit_path $DIT_CKPT \
    --height 512 \
    --width 512

# panoptic segmentation
python inference.py \
    --query $QUERY_IMAGE \
    --output result_pseg_$OUTPUT \
    --task_dir template/panoptic_segmentation \
    --vae_path $VAE_CKPT \
    --dit_path $DIT_CKPT \
    --height 512 \
    --width 512

More information can be found in infer.sh.

BibTeX

@article{wang2024lavin,
  title={LaVin-DiT: Large Vision Diffusion Transformer},
  author={Wang, Zhaoqing and Xia, Xiaobo and Chen, Runnan and Yu, Dongdong and Wang, Changhu and Gong, Mingming and Liu, Tongliang},
  journal={arXiv preprint arXiv:2411.11505},
  year={2024}
}

Acknowledgements

The project is based on SiT, Flux and CogVideoX. Many thanks to these three projects for their excellent contributions!

About

Official implementation of LaVin-DiT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published