Pierre Musacchio1, Hyunmin Lee2, Jaesik Park1
1Seoul National University, 2LG AI Research
This work is an extension of the paper "Instance-Wise Occlusion and Depth Order in Natural Scenes" by Hyumin Lee and Jaesik Park, 2022 (CVPR).
This repository provides downloads for:
- The InstaOrder dataset. ✅
- The InstaOrder Panoptic dataset. ✅
- Weights for the InstaOrderNet model family. ✅
- Weights for the InstaFormer model family. ✅
We also explain how to run the code for training and evaluation for the InstaFormer model family.
The InstaOrder dataset is an extension of the COCO dataset. Carefully annotated for occlusion and depth order prediction, it contains 2.9M annotations on 101K natural scenes. [Click here for download]
The InstaOrder Panoptic dataset is an extension of the COCO panoptic dataset. It contains things annotations for occlusion and depth order prediction. It contains 2.9M annotations on 101K natural scenes. [Click here for download]
Note: we plan on making this repository also run plain InstaOrderNets, but this has yet to be implemented. For running those networks, we refer you to our former InstaOrder repository.
The InstaOrderNet family is capable of pairwise occlusion and depth order prediction given an input image alongside two instance masks. This family comes in three flavors: 'o', 'd' and "o,d", respectively for "occlusion" exclusively, "depth" exclusively and joint "occlusion, depth".
Backbone Config | Recall | Precision | F1 | WHDR (distinct) | WHDR (overlap) | WHDR (all) | Weights |
---|---|---|---|---|---|---|---|
InstaOrderNeto | 89.39 | 79.83 | 80.65 | -- | -- | -- | model |
InstaOrderNetd | -- | -- | -- | 12.95 | 25.96 | 17.51 | model |
InstaOrderNeto,d | 82.37 | 88.67 | 81.86 | 11.51 | 25.22 | 15.99 | model |
The InstaFormer family is capable of end-to-end holistic occlusion and depth order prediction. This family comes in three flavors: 'o', 'd' and "o,d", respectively for "occlusion" exclusively, "depth" exclusively and joint "occlusion, depth". In all cases, the model also outputs the scene segmentation.
For clarity, we only report results for occlusion and depth order prediction. Please, refer to the paper for the segmentation results.
This model flavor exclusively predicts occlusion orders.
Backbone Config | Recall | Precision | F1 | Weights |
---|---|---|---|---|
SWIN-T100 | 89.06 | 75.69 | 79.63 | model |
SWIN-S100 | 88.91 | 77.31 | 80.53 | model |
SWIN-B100 | 89.02 | 76.95 | 80.64 | model |
SWIN-B†100 | 89.53 | 77.34 | 80.99 | model |
SWIN-L†200 | 89.82 | 78.10 | 81.89 | model |
This model flavor exclusively predicts depth orders.
Backbone config | WHDR (distinct) | WHDR (overlap) | WHDR (all) | Weights |
---|---|---|---|---|
SWIN-T100 | 8.10 | 25.43 | 13.75 | model |
SWIN-S100 | 8.44 | 26.04 | 14.48 | model |
SWIN-B100 | 8.28 | 25.05 | 13.88 | model |
SWIN-B†100 | 8.15 | 25.19 | 13.72 | model |
SWIN-L†200 | 8.47 | 24.91 | 13.73 | model |
This model flavor jointly predicts occlusion and depth orders.
Backbone Config | Recall | Precision | F1 | WHDR (distinct) | WHDR (overlap) | WHDR (all) | Weights |
---|---|---|---|---|---|---|---|
SWIN-T100 | 88.64 | 75.56 | 79.74 | 8.43 | 25.36 | 14.03 | model |
SWIN-S100 | 88.20 | 75.98 | 79.57 | 8.54 | 25.42 | 13.96 | model |
SWIN-B100 | 88.47 | 75.96 | 79.72 | 8.84 | 25.77 | 14.39 | model |
SWIN-B†100 | 89.24 | 76.66 | 80.34 | 8.15 | 25.79 | 14.06 | model |
SWIN-L†200 | 89.57 | 78.07 | 81.37 | 7.90 | 24.68 | 13.30 | model |
This code has been developed under NVCC 11.7, python 3.8.18, pytorch 2.1.0, torchvision 0.16.0 and detectron2 0.6 (built from source in commit 80307d2 due to import issues).
We heavily recommend to build the code in a docker container and a conda
environment.
First, install the apt-get
dependencies:
apt-get update && apt-get upgrade -y
# ninja
apt-get install build-ninja -y
# opencv dependencies
apt-get install libgl1-mesa-glx libglib2.0-0 -y
Then, create a conda
environment and activate it:
conda create -n instaorder python=3.8 -y
conda activate instaorder
Finally, run the quick_install.sh
file:
. ./quick_install.sh
First, prepare the COCO dataset files in the structure explained in this tutorial.
Do not forget to set your environment variable $DETECTRON2_DATASETS
to the proper directory.
Then, simply place the InstaOrder Panoptic json file downloaded in the previous section in the annotations
directory.
First, download a pre-trained Mask2Former panoptic model from the Mask2Former model Zoo, then run the following command:
python train_net.py \
--num-gpus <gpus> \
--config-file <path/to/instaformer/cfg.yaml> \
MODEL.WEIGHTS <path/to/m2f/weights.pkl> \
SOLVER.IMS_PER_BATCH <batch>
Where:
<gpus>
is the number of GPUs for training,<path/to/instaformer/cfg.yaml>
is a yaml file of the model's config (located in configs/instaorder/),<path/to/m2f/weights.pkl>
is a.pkl
file containing the weights of the Mask2Former model of your choice,<batch>
is the batch size for the training.
Evaluation on a trained InstaFormer model can be run using this command:
python train_net.py \
--eval-only \
--num-gpus <gpus> \
--config-file <path/to/instaformer/cfg.yaml> \
MODEL.WEIGHTS <path/to/instaformer/weights.pth> \
Inference on custom images can be run using the following command:
python demo/demo.py \
--config-file <path/to/instaformer/cfg.yaml> \
--input <path/to/image.jpg>
--output <path/to/out/dir>
MODEL.WEIGHTS <path/to/instaformer/weights.pth> \
TEST.OCCLUSION_EVALUATION False \
TEST.DEPTH_EVALUATION False
Where:
<path/to/instaformer/cfg.yaml>
is a yaml file of the model's config (located in configs/instaorder/),<path/to/image.jpg>
is the input image file path,<path/to/out/dir>
is the folder path where the output will be stored,<path/to/instaformer/weights.pkl>
is a.pth
file containing the weights of the Mask2Former model of your choice,<batch>
is the batch size for the training.
Since the configuration is made for training and evaluation, you have to manually set TEST.OCCLUSION_EVALUATION
and TEST.DEPTH_EVALUATION
to False
.
We do not have a citation for our most recent work, however, if you found our work useful, please consider citing our former work:
@inproceedings{lee2022instaorder,
title={{Instance-wise Occlusion and Depth Orders in Natural Scenes}},
author={Hyunmin Lee and Jaesik Park},
booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},
year={2022}
}
Our code is based on Mask2Former's official repository. We thank the authors for the open-sourcing their code with the community.