Skip to content

Latest commit

 

History

History
169 lines (128 loc) · 6.98 KB

README.md

File metadata and controls

169 lines (128 loc) · 6.98 KB

IGEV++

IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching
Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Junda Cheng, Chunyuan Liao, Xin Yang

Visual comparisons with SOTA methods in large disparities.

image PCWNet is a volume filtering-based method, DLNR is an iterative optimization-based method, and GMStereo is a transformer-based method. Our IGEV++ performs well in large textureless regions at close range with large disparities.

Network architecture

image The IGEV++ first builds Multi-range Geometry Encoding Volumes (MGEV) via Adaptive Patch Matching (APM). MEGV encodes coarse-grained geometry information of the scene for textureless regions and large disparities and fine-grained geometry information for details and small disparities after 3D aggregation or regularization. Then we regress an initial disparity map from MGEV through soft argmin, which serves as the starting point for ConvGRUs. In each iteration, we index multi-range and multi-granularity geometry features from MGEV, selectively fuse them, and then input them into ConvGRUs to update the disparity field.

📢 News

2024-12-30: We add bfloat16 training to prevent potential NAN issues during the training process.

Comparisons with SOTA methods

Left: Comparisons with state-of-the-art stereo methods across different disparity ranges on the Scene Flow test set. Our IGEV++ outperforms previously published methods by a large margin across all disparity ranges. Right: Comparisons with state-of-the-art stereo methods on Middlebury and KITTI leaderboards.

Demos

Pretrained models can be downloaded from google drive

We assume the downloaded pretrained weights are located under the pretrained_models directory.

You can demo a trained model on pairs of images. To predict stereo for demo-imgs directory, run

python demo_imgs.py --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth --left_imgs './demo-imgs/*/im0.png' --right_imgs './demo-imgs/*/im1.png'

You can switch to your own test data directory, or place your own pairs of test images in ./demo-imgs.

Environment

  • NVIDIA RTX 3090
  • python 3.8

Create a virtual environment and activate it.

conda create -n IGEV_plusplus python=3.8
conda activate IGEV_plusplus

Dependencies

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install tqdm
pip install scipy
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib 
pip install timm==0.5.4

Required Data

Evaluation

To evaluate IGEV++ on Scene Flow or Middlebury, run

python evaluate_stereo.py --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth --dataset sceneflow

or

python evaluate_stereo.py --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth --dataset middlebury_H

To evaluate RT-IGEV++ (real-time version) on Scene Flow, run

python evaluate_stereo_rt.py --dataset sceneflow --restore_ckpt ./pretrained_models/igev_rt/sceneflow.pth

Training

To train IGEV++ on Scene Flow or KITTI, run

python train_stereo.py --train_datasets sceneflow

or

python train_stereo.py --train_datasets kitti --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth

To train IGEV++ on Middlebury or ETH3D, you need to run

python train_stereo.py --train_datasets middlebury_train --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth --image_size 384 512 --num_steps 200000
python train_stereo.py --tarin_datasets middlebury_finetune --restore_ckpt ./checkpoints/middlebury_train.pth --image_size 384 768 --num_steps 100000

or

python train_stereo.py --train_datasets eth3d_train --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth --image_size 384 512 --num_steps 300000
python train_stereo.py --tarin_datasets eth3d_finetune --restore_ckpt ./checkpoints/eth3d_train.pth --image_size 384 512 --num_steps 100000

Bfloat16 Training

NaN values during training: If you encounter NaN values in your training, this is likely due to overflow when using float16. This can happen when large gradients or high activation values exceed the range represented by float16. To fix this:

-Try switching to bfloat16 by using --precision_dtype bfloat16.

-Alternatively, you can use float32 precision by setting --precision_dtype float32.

Training with bfloat16

  1. Before you start training, make sure you have hardware that supports bfloat16 and the right environment set up for mixed precision training. Create the environment and install dependencies into it:

    conda create -n IGEV_plusplus_bf16 python=3.8
    conda activate IGEV_plusplus_bf16
    bash env_bfloat16.sh
  2. Then you can train the model with bfloat16 precision:

     python train_stereo.py --mixed_precision --precision_dtype bfloat16

Submission

For IGEV++ submission to the KITTI benchmark, run

python save_disp.py

For RT-IGEV++ submission to the KITTI benchmark, run

python save_disp_rt.py

Citation

If you find our works useful in your research, please consider citing our papers:

@article{xu2024igev++,
  title={IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching},
  author={Xu, Gangwei and Wang, Xianqi and Zhang, Zhaoxing and Cheng, Junda and Liao, Chunyuan and Yang, Xin},
  journal={arXiv preprint arXiv:2409.00638},
  year={2024}
}

@inproceedings{xu2023iterative,
  title={Iterative Geometry Encoding Volume for Stereo Matching},
  author={Xu, Gangwei and Wang, Xianqi and Ding, Xiaohuan and Yang, Xin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={21919--21928},
  year={2023}
}

Acknowledgements

This project is based on RAFT-Stereo, GMStereo, and CoEx. We thank the original authors for their excellent works.