FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR'2021)

Abstract

Emerging interests have been brought to recognize previously unseen objects given very few training examples, known as few-shot object detection (FSOD). Recent researches demonstrate that good feature embedding is the key to reach favorable few-shot learning performance. We observe object proposals with different Intersection-of-Union (IoU) scores are analogous to the intra-image augmentation used in contrastive approaches. And we exploit this analogy and incorporate supervised contrastive learning to achieve more robust objects representations in FSOD. We present Few-Shot object detection via Contrastive proposals Encoding (FSCE), a simple yet effective approach to learning contrastive-aware object proposal encodings that facilitate the classification of detected objects. We notice the degradation of average precision (AP) for rare objects mainly comes from misclassifying novel instances as confusable classes. And we ease the misclassification issues by promoting instance level intra-class compactness and inter-class variance via our contrastive proposal encoding loss (CPE loss). Our design outperforms current state-ofthe-art works in any shot and all data splits, with up to +8.8% on standard benchmark PASCAL VOC and +2.7% on challenging COCO benchmark. Code is available at: https://github.com/bsun0802/FSCE.git

Citation

@inproceedings{sun2021fsce,
    title={FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding},
    author={Sun, Bo and Li, Banghuai and Cai, Shengcai and Yuan, Ye and Zhang, Chi},
    booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)},
    year={2021}
}

Note: ALL the reported results use the data split released from fsce official repo, unless stated otherwise. Currently, each setting is only evaluated with one fiNed few shot dataset. Please refer to here to get more details about the dataset and data preparation.

How to reproduce FSCE

Following the original implementation, it consists of 3 steps:

Step1: Base training
- use all the images and annotations of base classes to train a base model.
Step2: Reshape the bbox head of base model:
- create a new bbox head for all classes fine-tuning (base classes + novel classes)
- the weights of base class in new bbox head directly use the original one as initialization.
- the weights of novel class in new bbox head use random initialization.
Step3: Few shot fine-tuning:
- use the base model from step2 as model initialization and further fine tune the bbox head with few shot datasets.

An example of VOC split1 1 shot setting with 8 gpus

# step1: base training for voc split1
bash ./tools/detection/dist_train.sh \
    configs/detection/fsce/voc/split1/fsce_r101_fpn_voc-split1_base-training.py 8

# step2: reshape the bbox head of base model for few shot fine-tuning
python -m tools.detection.misc.initialize_bbox_head \
    --src1 work_dirs/fsce_r101_fpn_voc-split1_base-training/latest.pth \
    --method randinit \
    --save-dir work_dirs/fsce_r101_fpn_voc-split1_base-training

# step3: few shot fine-tuning
bash ./tools/detection/dist_train.sh \
    configs/detection/fsce/voc/split1/fsce_r101_fpn_voc-split1_1shot-fine-tuning.py 8

Note:

The default output path of the reshaped base model in step2 is set to work_dirs/{BASE TRAINING CONFIG}/base_model_random_init_bbox_head.pth. When the model is saved to different path, please update the argument load_from in step3 few shot fine-tune configs instead of using resume_from.
To use pre-trained checkpoint, please set the load_from to the downloaded checkpoint path.

Results on VOC dataset

Base Training

arch	contrastive loss	Split	Base AP50	ckpt(step1)	ckpt(step2)	log
r101_fpn	N	1	80.9	ckpt	ckpt	log
r101_fpn	N	2	82.0	ckpt	ckpt	log
r101_fpn	N	3	82.1	ckpt	ckpt	log

Note:

All the base training configs is the same as TFA. Therefore, the few shot fine-tuning can directly reuse the reshaped base model of fsce by creating a symlink or copying the whole checkpoint to the corresponding folder. Also, the released base training checkpoint is the same as the TFA, too.
The performance of the same few shot setting using different base training models can be dramatically unstable (AP50 can fluctuate by 5.0 or more), even their mAP on base classes are very close.
Temporally, the solution to getting a good base model is training the base model with different random seed. Also, the random seed used in this code base may not the optimal one, and it is possible to get the higher results by using other random seeds. However, using the same random seed still can not guarantee the identical result each time, as some nondeterministic CUDA operations. We will continue to investigate and improve it.
To reproduce the reported few shot results, it is highly recommended using the released step2 model for few shot fine-tuning.
The difficult samples will be used in base training, but not be used in few shot setting.

Few Shot Fine-tuning

arch	contrastive loss	Split	Shot	Base AP50	Novel AP50	ckpt	log
r101_fpn	N	1	1	78.4	41.2	ckpt	log
r101_fpn	N	1	2	77.8	51.1	ckpt	log
r101_fpn	N	1	3	76.1	49.3	ckpt	log
r101_fpn	N	1	5	75.9	59.4	ckpt	log
r101_fpn	N	1	10	76.4	62.6	ckpt	log
r101_fpn	Y	1	3	75.0	48.9	ckpt	log
r101_fpn	Y	1	5	75.0	58.8	ckpt	log
r101_fpn	Y	1	10	75.5	63.3	ckpt	log
r101_fpn	N	2	1	79.8	25.0	ckpt	log
r101_fpn	N	2	2	78.0	30.6	ckpt	log
r101_fpn	N	2	3	76.4	43.4	ckpt	log
r101_fpn	N	2	5	77.2	45.3	ckpt	log
r101_fpn	N	2	10	77.5	50.4	ckpt	log
r101_fpn	Y	2	3	76.3	43.3	ckpt	log
r101_fpn	Y	2	5	76.6	45.9	ckpt	log
r101_fpn	Y	2	10	76.8	50.4	ckpt	log
r101_fpn	N	3	1	79.0	39.8	ckpt	log
r101_fpn	N	3	2	78.4	41.5	ckpt	log
r101_fpn	N	3	3	76.1	47.1	ckpt	log
r101_fpn	N	3	5	77.4	54.1	ckpt	log
r101_fpn	N	3	10	77.7	57.4	ckpt	log
r101_fpn	Y	3	3	75.6	48.1	ckpt	log
r101_fpn	Y	3	5	76.2	55.7	ckpt	log
r101_fpn	Y	3	10	77.0	57.9	ckpt	log

Note:

Following the original implementation, the contrastive loss only is added at VOC 3/5/10 shot setting, while in VOC 1/2 shot setting only the fc_cls and fc_reg layers are fine-tuned.
Some arguments of configs are different from the official codes, for example, the official codes use aug test in some settings, while all the results reported above do not use aug_test.

Results on COCO dataset

Base Training

arch	contrastive loss	Base mAP	ckpt(step1)	ckpt(step2)	log
r101_fpn	N	39.50	ckpt	ckpt	log

Few Shot Fine-tuning

arch	shot	contrastive loss	Base mAP	Novel mAP	ckpt	log
r101_fpn	10	N	31.7	11.7	ckpt	log
r101_fpn	30	N	32.3	16.4	ckpt	log

Note:

Some arguments of configs are different from the official codes, for example, the official codes use aug test in some settings, while all the results reported above do not use aug_test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR'2021)

Abstract

Citation

How to reproduce FSCE

An example of VOC split1 1 shot setting with 8 gpus

Results on VOC dataset

Base Training

Few Shot Fine-tuning

Results on COCO dataset

Base Training

Few Shot Fine-tuning

Files

README.md

Latest commit

History

README.md

File metadata and controls

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR'2021)

Abstract

Citation

How to reproduce FSCE

An example of VOC split1 1 shot setting with 8 gpus

Results on VOC dataset

Base Training

Few Shot Fine-tuning

Results on COCO dataset

Base Training

Few Shot Fine-tuning