Meta R-CNN: Towards General Solver for Instance-level Low-shot Learning (ICCV'2019)

Abstract

Resembling the rapid learning capability of human, low-shot learning empowers vision systems to understand new concepts by training with few samples. Leading approaches derived from meta-learning on images with a single visual object. Obfuscated by a complex background and multiple objects in one image, they are hard to promote the research of low-shot object detection/segmentation. In this work, we present a flexible and general methodology to achieve these tasks. Our work extends Faster /Mask R-CNN by proposing meta-learning over RoI (Region-of-Interest) features instead of a full image feature. This simple spirit disentangles multi-object information merged with the background, without bells and whistles, enabling Faster / Mask R-CNN turn into a meta-learner to achieve the tasks. Specifically, we introduce a Predictor-head Remodeling Network (PRN) that shares its main backbone with Faster / Mask R-CNN. PRN receives images containing low-shot objects with their bounding boxes or masks to infer their class attentive vectors. The vectors take channel-wise soft-attention on RoI features, remodeling those R-CNN predictor heads to detect or segment the objects consistent with the classes these vectors represent. In our experiments, Meta R-CNN yields the new state of the art in low-shot object detection and improves low-shot object segmentation by Mask R-CNN. Code: https://yanxp.github.io/metarcnn.html.

Citation

@inproceedings{yan2019meta,
    title={Meta r-cnn: Towards general solver for instance-level low-shot learning},
    author={Yan, Xiaopeng and Chen, Ziliang and Xu, Anni and Wang, Xiaoxi and Liang, Xiaodan and Lin, Liang},
    booktitle={Proceedings of the IEEE International Conference on Computer Vision},
    year={2019}
}

Note: ALL the reported results use the data split released from TFA official repo. Currently, each setting is only evaluated with one fixed few shot dataset. Please refer to DATA Preparation to get more details about the dataset and data preparation.

How to reproduce Meta RCNN

Following the original implementation, it consists of 2 steps:

Step1: Base training
- use all the images and annotations of base classes to train a base model.
Step2: Few shot fine-tuning:
- use the base model from step1 as model initialization and further fine tune the model with few shot datasets.

An example of VOC split1 1 shot setting with 8 gpus

# step1: base training for voc split1
bash ./tools/detection/dist_train.sh \
    configs/detection/meta_rcnn/voc/split1/meta-rcnn_r101_c4_8xb4_voc-split1_base-training.py 8

# step2: few shot fine-tuning
bash ./tools/detection/dist_train.sh \
    configs/detection/meta_rcnn/voc/split1/meta-rcnn_r101_c4_8xb4_voc-split1_1shot-fine-tuning.py 8

Note:

The default output path of the reshaped base model in step2 is set to work_dirs/{BASE TRAINING CONFIG}/base_model_random_init_bbox_head.pth. When the model is saved to different path, please update the argument load_from in step3 few shot fine-tune configs instead of using resume_from.
To use pre-trained checkpoint, please set the load_from to the downloaded checkpoint path.

Results on VOC dataset

Note:

The official implementation use batch size 1x4 for training, while we use batch size 8x4.
For few shot fine-tuning we only fine tune the bbox head and the iterations or training strategy may not be the optimal in 8gpu setting.
Base training use 200 support base instances each class for testing.
The performance of the base training and few shot setting can be unstable, even using the same random seed. To reproduce the reported few shot results, it is highly recommended using the released model for few shot fine-tuning.
The difficult samples will be used in base training query set, but not be used in support set and few shot setting.

Base Training

Arch	Split	Base AP50	ckpt	log
r101 c4	1	72.8	ckpt	log
r101 c4	2	73.3	ckpt	log
r101 c4	3	74.2	ckpt	log

Few Shot Fine-tuning

Arch	Split	Shot	Base AP50	Novel AP50	ckpt	log
r101 c4	1	1	58.8	40.2	ckpt	log
r101 c4	1	2	67.7	49.9	ckpt	log
r101 c4	1	3	69.0	54.0	ckpt	log
r101 c4	1	5	70.8	55.0	ckpt	log
r101 c4	1	10	71.7	56.3	ckpt	log
r101 c4	2	1	61.0	27.3	ckpt	log
r101 c4	2	2	69.5	34.8	ckpt	log
r101 c4	2	3	71.0	39.0	ckpt	log
r101 c4	2	5	71.7	36.0	ckpt	log
r101 c4	2	10	72.6	40.1	ckpt	log
r101 c4	3	1	63.0	32.0	ckpt	log
r101 c4	3	2	70.1	37.9	ckpt	log
r101 c4	3	3	71.3	42.5	ckpt	log
r101 c4	3	5	72.3	49.6	ckpt	log
r101 c4	3	10	73.2	49.1	ckpt	log

Results on COCO dataset

Note:

The official implementation use batch size 1x4 for training, while we use batch size 8x4.
For few shot fine-tuning we only fine tune the bbox head and the iterations or training strategy may not be the optimal in 8gpu setting.
Base training use 200 support base instances each class for testing.
The performance of the base training and few shot setting can be unstable, even using the same random seed. To reproduce the reported few shot results, it is highly recommended using the released model for few shot fine-tuning.

Base Training

Arch	Base mAP	ckpt	log
r50 c4	27.8	ckpt	log

Few Shot Finetuning

Arch	Shot	Base mAP	Novel AP50	ckpt	log
r50 c4	10	25.1	9.4	ckpt	log
r50 c4	30	26.9	11.5	ckpt	log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Meta R-CNN: Towards General Solver for Instance-level Low-shot Learning (ICCV'2019)

Abstract

Citation

How to reproduce Meta RCNN

An example of VOC split1 1 shot setting with 8 gpus

Results on VOC dataset

Base Training

Few Shot Fine-tuning

Results on COCO dataset

Base Training

Files

README.md

Latest commit

History

README.md

File metadata and controls

Meta R-CNN: Towards General Solver for Instance-level Low-shot Learning (ICCV'2019)

Abstract

Citation

How to reproduce Meta RCNN

An example of VOC split1 1 shot setting with 8 gpus

Results on VOC dataset

Base Training

Few Shot Fine-tuning

Results on COCO dataset

Base Training