IGEV++

IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching
Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Junda Cheng, Chunyuan Liao, Xin Yang

Visual comparisons with SOTA methods in large disparities.

PCWNet is a volume filtering-based method, DLNR is an iterative optimization-based method, and GMStereo is a transformer-based method. Our IGEV++ performs well in large textureless regions at close range with large disparities.

Network architecture

The IGEV++ first builds Multi-range Geometry Encoding Volumes (MGEV) via Adaptive Patch Matching (APM). MEGV encodes coarse-grained geometry information of the scene for textureless regions and large disparities and fine-grained geometry information for details and small disparities after 3D aggregation or regularization. Then we regress an initial disparity map from MGEV through soft argmin, which serves as the starting point for ConvGRUs. In each iteration, we index multi-range and multi-granularity geometry features from MGEV, selectively fuse them, and then input them into ConvGRUs to update the disparity field.

📢 News

2024-12-30: We add bfloat16 training to prevent potential NAN issues during the training process.

Comparisons with SOTA methods

Left: Comparisons with state-of-the-art stereo methods across different disparity ranges on the Scene Flow test set. Our IGEV++ outperforms previously published methods by a large margin across all disparity ranges. Right: Comparisons with state-of-the-art stereo methods on Middlebury and KITTI leaderboards.

Demos

Pretrained models can be downloaded from google drive

We assume the downloaded pretrained weights are located under the pretrained_models directory.

You can demo a trained model on pairs of images. To predict stereo for demo-imgs directory, run

python demo_imgs.py --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth --left_imgs './demo-imgs/*/im0.png' --right_imgs './demo-imgs/*/im1.png'

You can switch to your own test data directory, or place your own pairs of test images in ./demo-imgs.

Environment

NVIDIA RTX 3090
python 3.8

Create a virtual environment and activate it.

conda create -n IGEV_plusplus python=3.8
conda activate IGEV_plusplus

Dependencies

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install tqdm
pip install scipy
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib 
pip install timm==0.5.4

Required Data

Evaluation

To evaluate IGEV++ on Scene Flow or Middlebury, run

python evaluate_stereo.py --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth --dataset sceneflow

or

python evaluate_stereo.py --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth --dataset middlebury_H

To evaluate RT-IGEV++ (real-time version) on Scene Flow, run

python evaluate_stereo_rt.py --dataset sceneflow --restore_ckpt ./pretrained_models/igev_rt/sceneflow.pth

Training

To train IGEV++ on Scene Flow or KITTI, run

python train_stereo.py --train_datasets sceneflow

or

python train_stereo.py --train_datasets kitti --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth

To train IGEV++ on Middlebury or ETH3D, you need to run

python train_stereo.py --train_datasets middlebury_train --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth --image_size 384 512 --num_steps 200000
python train_stereo.py --tarin_datasets middlebury_finetune --restore_ckpt ./checkpoints/middlebury_train.pth --image_size 384 768 --num_steps 100000

or

python train_stereo.py --train_datasets eth3d_train --restore_ckpt ./pretrained_models/igev_plusplus/sceneflow.pth --image_size 384 512 --num_steps 300000
python train_stereo.py --tarin_datasets eth3d_finetune --restore_ckpt ./checkpoints/eth3d_train.pth --image_size 384 512 --num_steps 100000

Bfloat16 Training

NaN values during training: If you encounter NaN values in your training, this is likely due to overflow when using float16. This can happen when large gradients or high activation values exceed the range represented by float16. To fix this:

-Try switching to bfloat16 by using --precision_dtype bfloat16.

-Alternatively, you can use float32 precision by setting --precision_dtype float32.

Training with bfloat16

Before you start training, make sure you have hardware that supports bfloat16 and the right environment set up for mixed precision training. Create the environment and install dependencies into it:
```
conda create -n IGEV_plusplus_bf16 python=3.8
conda activate IGEV_plusplus_bf16
bash env_bfloat16.sh
```

Then you can train the model with bfloat16 precision:

 python train_stereo.py --mixed_precision --precision_dtype bfloat16

Submission

For IGEV++ submission to the KITTI benchmark, run

python save_disp.py

For RT-IGEV++ submission to the KITTI benchmark, run

python save_disp_rt.py

Citation

If you find our works useful in your research, please consider citing our papers:

@article{xu2024igev++,
  title={IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching},
  author={Xu, Gangwei and Wang, Xianqi and Zhang, Zhaoxing and Cheng, Junda and Liao, Chunyuan and Yang, Xin},
  journal={arXiv preprint arXiv:2409.00638},
  year={2024}
}

@inproceedings{xu2023iterative,
  title={Iterative Geometry Encoding Volume for Stereo Matching},
  author={Xu, Gangwei and Wang, Xianqi and Ding, Xiaohuan and Yang, Xin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={21919--21928},
  year={2023}
}

Acknowledgements

This project is based on RAFT-Stereo, GMStereo, and CoEx. We thank the original authors for their excellent works.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
core		core
core_rt		core_rt
demo-imgs		demo-imgs
figures		figures
LICENSE		LICENSE
README.md		README.md
demo_imgs.py		demo_imgs.py
env_bfloat16.sh		env_bfloat16.sh
evaluate_stereo.py		evaluate_stereo.py
evaluate_stereo_rt.py		evaluate_stereo_rt.py
save_disp.py		save_disp.py
save_disp_rt.py		save_disp_rt.py
train_stereo.py		train_stereo.py
train_stereo_rt.py		train_stereo_rt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IGEV++

Visual comparisons with SOTA methods in large disparities.

Network architecture

📢 News

Comparisons with SOTA methods

Demos

Environment

Create a virtual environment and activate it.

Dependencies

Required Data

Evaluation

Training

Bfloat16 Training

Training with bfloat16

Submission

Citation

Acknowledgements

About

Releases

Packages

Languages

License

gangweiX/IGEV-plusplus

Folders and files

Latest commit

History

Repository files navigation

IGEV++

Visual comparisons with SOTA methods in large disparities.

Network architecture

📢 News

Comparisons with SOTA methods

Demos

Environment

Create a virtual environment and activate it.

Dependencies

Required Data

Evaluation

Training

Bfloat16 Training

Training with bfloat16

Submission

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages