Instance-Wise Holistic Order Prediction In Natural Scenes

Pierre Musacchio¹, Hyunmin Lee², Jaesik Park¹

¹Seoul National University, ²LG AI Research

This work is an extension of the paper "Instance-Wise Occlusion and Depth Order in Natural Scenes" by Hyumin Lee and Jaesik Park, 2022 (CVPR).

Qualitative results obtained by InstaFormer^o,d-L^†₂₀₀

Overview

This repository provides downloads for:

The InstaOrder dataset. ✅
The InstaOrder Panoptic dataset. ✅
Weights for the InstaOrderNet model family. ✅
Weights for the InstaFormer model family. ✅

We also explain how to run the code for training and evaluation for the InstaFormer model family.

Datasets

InstaOrder

The InstaOrder dataset is an extension of the COCO dataset. Carefully annotated for occlusion and depth order prediction, it contains 2.9M annotations on 101K natural scenes. [Click here for download]

InstaOrder Panoptic

The InstaOrder Panoptic dataset is an extension of the COCO panoptic dataset. It contains things annotations for occlusion and depth order prediction. It contains 2.9M annotations on 101K natural scenes. [Click here for download]

The InstaOrderNet Model Family

Note: we plan on making this repository also run plain InstaOrderNets, but this has yet to be implemented. For running those networks, we refer you to our former InstaOrder repository.

The InstaOrderNet family is capable of pairwise occlusion and depth order prediction given an input image alongside two instance masks. This family comes in three flavors: 'o', 'd' and "o,d", respectively for "occlusion" exclusively, "depth" exclusively and joint "occlusion, depth".

Backbone Config	Recall	Precision	F1	WHDR (distinct)	WHDR (overlap)	WHDR (all)	Weights
InstaOrderNet^o	89.39	79.83	80.65	--	--	--	model
InstaOrderNet^d	--	--	--	12.95	25.96	17.51	model
InstaOrderNet^o,d	82.37	88.67	81.86	11.51	25.22	15.99	model

The InstaFormer Model Family

The InstaFormer family is capable of end-to-end holistic occlusion and depth order prediction. This family comes in three flavors: 'o', 'd' and "o,d", respectively for "occlusion" exclusively, "depth" exclusively and joint "occlusion, depth". In all cases, the model also outputs the scene segmentation.

For clarity, we only report results for occlusion and depth order prediction. Please, refer to the paper for the segmentation results.

InstaFormer^o

This model flavor exclusively predicts occlusion orders.

Backbone Config	Recall	Precision	F1	Weights
SWIN-T₁₀₀	89.06	75.69	79.63	model
SWIN-S₁₀₀	88.91	77.31	80.53	model
SWIN-B₁₀₀	89.02	76.95	80.64	model
SWIN-B^†₁₀₀	89.53	77.34	80.99	model
SWIN-L^†₂₀₀	89.82	78.10	81.89	model

InstaFormer^d

This model flavor exclusively predicts depth orders.

Backbone config	WHDR (distinct)	WHDR (overlap)	WHDR (all)	Weights
SWIN-T₁₀₀	8.10	25.43	13.75	model
SWIN-S₁₀₀	8.44	26.04	14.48	model
SWIN-B₁₀₀	8.28	25.05	13.88	model
SWIN-B^†₁₀₀	8.15	25.19	13.72	model
SWIN-L^†₂₀₀	8.47	24.91	13.73	model

InstaFormer^o,d

This model flavor jointly predicts occlusion and depth orders.

Backbone Config	Recall	Precision	F1	WHDR (distinct)	WHDR (overlap)	WHDR (all)	Weights
SWIN-T₁₀₀	88.64	75.56	79.74	8.43	25.36	14.03	model
SWIN-S₁₀₀	88.20	75.98	79.57	8.54	25.42	13.96	model
SWIN-B₁₀₀	88.47	75.96	79.72	8.84	25.77	14.39	model
SWIN-B^†₁₀₀	89.24	76.66	80.34	8.15	25.79	14.06	model
SWIN-L^†₂₀₀	89.57	78.07	81.37	7.90	24.68	13.30	model

Running InstaFormer

Environment setup

This code has been developed under NVCC 11.7, python 3.8.18, pytorch 2.1.0, torchvision 0.16.0 and detectron2 0.6 (built from source in commit 80307d2 due to import issues).

We heavily recommend to build the code in a docker container and a conda environment.

First, install the apt-get dependencies:

apt-get update && apt-get upgrade -y

# ninja
apt-get install build-ninja -y
# opencv dependencies
apt-get install libgl1-mesa-glx libglib2.0-0 -y

Then, create a conda environment and activate it:

conda create -n instaorder python=3.8 -y
conda activate instaorder

Finally, run the quick_install.sh file:

. ./quick_install.sh

Dataset Preparation

First, prepare the COCO dataset files in the structure explained in this tutorial. Do not forget to set your environment variable $DETECTRON2_DATASETS to the proper directory.

Then, simply place the InstaOrder Panoptic json file downloaded in the previous section in the annotations directory.

Training

First, download a pre-trained Mask2Former panoptic model from the Mask2Former model Zoo, then run the following command:

python train_net.py \
--num-gpus <gpus> \
--config-file <path/to/instaformer/cfg.yaml> \
MODEL.WEIGHTS <path/to/m2f/weights.pkl> \
SOLVER.IMS_PER_BATCH <batch>

Where:

<gpus> is the number of GPUs for training,
<path/to/instaformer/cfg.yaml> is a yaml file of the model's config (located in configs/instaorder/),
<path/to/m2f/weights.pkl> is a .pkl file containing the weights of the Mask2Former model of your choice,
<batch> is the batch size for the training.

Evaluation on pre-trained models

Evaluation on a trained InstaFormer model can be run using this command:

python train_net.py \
--eval-only \
--num-gpus <gpus> \
--config-file <path/to/instaformer/cfg.yaml> \
MODEL.WEIGHTS <path/to/instaformer/weights.pth> \

Inference on custom images

Inference on custom images can be run using the following command:

python demo/demo.py \
--config-file <path/to/instaformer/cfg.yaml> \
--input <path/to/image.jpg>
--output <path/to/out/dir>
MODEL.WEIGHTS <path/to/instaformer/weights.pth> \
TEST.OCCLUSION_EVALUATION False \
TEST.DEPTH_EVALUATION False

Where:

<path/to/instaformer/cfg.yaml> is a yaml file of the model's config (located in configs/instaorder/),
<path/to/image.jpg> is the input image file path,
<path/to/out/dir> is the folder path where the output will be stored,
<path/to/instaformer/weights.pkl> is a .pth file containing the weights of the Mask2Former model of your choice,
<batch> is the batch size for the training.

Since the configuration is made for training and evaluation, you have to manually set TEST.OCCLUSION_EVALUATION and TEST.DEPTH_EVALUATION to False.

Citation

We do not have a citation for our most recent work, however, if you found our work useful, please consider citing our former work:

@inproceedings{lee2022instaorder,
  title={{Instance-wise Occlusion and Depth Orders in Natural Scenes}},
  author={Hyunmin Lee and Jaesik Park},
  booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

Acknowledgments

Our code is based on Mask2Former's official repository. We thank the authors for the open-sourcing their code with the community.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
configs		configs
datasets		datasets
demo		demo
instaorder		instaorder
mask2former		mask2former
tools		tools
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
predict.py		predict.py
quick_install.sh		quick_install.sh
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instance-Wise Holistic Order Prediction In Natural Scenes

Overview

Datasets

InstaOrder

InstaOrder Panoptic

The InstaOrderNet Model Family

The InstaFormer Model Family

InstaFormer^o

InstaFormer^d

InstaFormer^o,d

Running InstaFormer

Environment setup

Dataset Preparation

Training

Evaluation on pre-trained models

Inference on custom images

Citation

Acknowledgments

About

Releases

Packages

Languages

License

SNU-VGILab/InstaOrder

Folders and files

Latest commit

History

Repository files navigation

Instance-Wise Holistic Order Prediction In Natural Scenes

Overview

Datasets

InstaOrder

InstaOrder Panoptic

The InstaOrderNet Model Family

The InstaFormer Model Family

InstaFormero

InstaFormerd

InstaFormero,d

Running InstaFormer

Environment setup

Dataset Preparation

Training

Evaluation on pre-trained models

Inference on custom images

Citation

Acknowledgments

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

InstaFormer^o

InstaFormer^d

InstaFormer^o,d

Packages