Multi-Scale Positive Sample Refinement for Few-Shot Object Detection (ECCV'2020)
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances, and is useful when manual annotation is time-consuming or data acquisition is limited. Unlike previous attempts that exploit few-shot classification techniques to facilitate FSOD, this work highlights the necessity of handling the problem of scale variations, which is challenging due to the unique sample distribution. To this end, we propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD. It generates multi-scale positive samples as object pyramids and refines the prediction at various scales. We demonstrate its advantage by integrating it as an auxiliary branch to the popular architecture of Faster R-CNN with FPN, delivering a strong FSOD solution. Several experiments are conducted on PASCAL VOC andMS COCO, and the proposed approach achieves state of the art results and significantly outperforms other counterparts, which shows its effectiveness. Code is available at https://github.com/jiaxi-wu/MPSR.
@inproceedings{wu2020mpsr,
title={Multi-Scale Positive Sample Refinement for Few-Shot Object Detection},
author={Wu, Jiaxi and Liu, Songtao and Huang, Di and Wang, Yunhong},
booktitle={European Conference on Computer Vision},
year={2020}
}
Note: ALL the reported results use the data split released from TFA official repo. Currently, each setting is only evaluated with one fixed few shot dataset. Please refer to DATA Preparation to get more details about the dataset and data preparation.
Following the original implementation, it consists of 2 steps:
-
Step1: Base training
- use all the images and annotations of base classes to train a base model.
-
Step2: Few shot fine-tuning:
- use the base model from step1 as model initialization and further fine tune the model with few shot datasets.
# step1: base training for voc split1
bash ./tools/detection/dist_train.sh \
configs/detection/mpsr/voc/split1/mpsr_r101_fpn_2xb2_voc-split1_base-training.py 8
# step2: few shot fine-tuning
bash ./tools/detection/dist_train.sh \
configs/detection/mpsr/voc/split1/mpsr_r101_fpn_2xb2_voc-split1_1shot-fine-tuning.py 8
Note:
- The default output path of the reshaped base model in step2 is set to
work_dirs/{BASE TRAINING CONFIG}/base_model_random_init_bbox_head.pth
. When the model is saved to different path, please update the argumentload_from
in step3 few shot fine-tune configs instead of usingresume_from
. - To use pre-trained checkpoint, please set the
load_from
to the downloaded checkpoint path.
Note:
- We follow the official implementation using batch size 2x2 for training.
- The performance of few shot setting can be unstable, even using the same random seed. To reproduce the reported few shot results, it is highly recommended using the released model for few shot fine-tuning.
- The difficult samples will not be used in base training or few shot setting.
Arch | Split | Base AP50 | ckpt | log |
---|---|---|---|---|
r101 fpn | 1 | 80.5 | ckpt | log |
r101 fpn | 2 | 81.3 | ckpt | log |
r101 fpn | 3 | 81.8 | ckpt | log |
r101 fpn* | 1 | 77.8 | ckpt | - |
r101 fpn* | 2 | 78.3 | ckpt | - |
r101 fpn* | 3 | 77.8 | ckpt | - |
Note:
- * means the model is converted from official repo, as we find that the base model trained from mmfewshot will get worse performance in fine-tuning especially in 1/2/3 shots, even their base training performance are higher. We will continue to investigate and improve it.
Arch | Split | Shot | Base AP50 | Novel AP50 | ckpt | log |
---|---|---|---|---|---|---|
r101 fpn* | 1 | 1 | 60.6 | 38.5 | ckpt | log |
r101 fpn* | 1 | 2 | 65.9 | 45.9 | ckpt | log |
r101 fpn* | 1 | 3 | 68.1 | 49.2 | ckpt | log |
r101 fpn* | 1 | 5 | 69.2 | 55.8 | ckpt | log |
r101 fpn* | 1 | 10 | 71.2 | 58.7 | ckpt | log |
r101 fpn* | 2 | 1 | 61.0 | 25.8 | ckpt | log |
r101 fpn* | 2 | 2 | 66.9 | 29.0 | ckpt | log |
r101 fpn* | 2 | 3 | 67.6 | 40.6 | ckpt | log |
r101 fpn* | 2 | 5 | 70.4 | 41.5 | ckpt | log |
r101 fpn* | 2 | 10 | 71.7 | 47.1 | ckpt | log |
r101 fpn* | 3 | 1 | 57.9 | 34.6 | ckpt | log |
r101 fpn* | 3 | 2 | 65.7 | 41.0 | ckpt | log |
r101 fpn* | 3 | 3 | 69.1 | 44.1 | ckpt | log |
r101 fpn* | 3 | 5 | 70.4 | 48.5 | ckpt | log |
r101 fpn* | 3 | 10 | 72.5 | 51.7 | ckpt | log |
- * means using base model converted from official repo
Note:
- We follow the official implementation using batch size 2x2 for training.
- The performance of base training and few shot setting can be unstable, even using the same random seed. To reproduce the reported few shot results, it is highly recommended using the released model for few shot fine-tuning.
Arch | Base mAP | ckpt | log |
---|---|---|---|
r101 fpn | 34.6 | ckpt | log |
Arch | Shot | Base mAP | Novel mAP | ckpt | log |
---|---|---|---|---|---|
r101 fpn | 10 | 23.2 | 12.6 | ckpt | log |
r101 fpn | 30 | 25.2 | 18.1 | ckpt | log |