Skip to content

(ECCV 2024) Can OOD Object Detectors Learn from Foundation Models?

License

Notifications You must be signed in to change notification settings

CVMI-Lab/SyncOOD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Can OOD Object Detectors Learn from Foundation Models?

The University of Hong Kong 
† corresponding author

European Conference on Computer Vision (ECCV) 2024

  • We would like to say YES to the title. We introduce SyncOOD to access open-world knowledge encapsulated within off-the-shelf foundation models by synthesizing meaningful OOD data.
  • SyncOOD provides an automatic, transparent, controllable, and low-cost pipeline for synthesizing scene-level images containing novel objects with annotation boxes via image editing.
  • The synthetic OOD samples are filtered and employed to augment the training of a lightweight, plug-and-play OOD detector, thus effectively optimizing the in-distribution(ID)/out-of-distribution(OOD) decision boundaries with minimal data usage.
  • Explore more in the paper: Can OOD Object Detectors Learn from Foundation Models? in ECCV 2024.

Quick Guide

This repository contains code of SyncOOD in two parts:

Abstract

Out-of-distribution (OOD) object detection is a challenging task due to the absence of open-set OOD data. Inspired by recent advancements in text-to-image generative models, such as Stable Diffusion, we study the potential of generative models trained on large-scale open-set data to synthesize OOD samples, thereby enhancing OOD object detection. We introduce SyncOOD, a simple data curation method that capitalizes on the capabilities of large foundation models to automatically extract meaningful OOD data from text-to-image generative models. This offers the model access to open-world knowledge encapsulated within off-the-shelf foundation models. The synthetic OOD samples are then employed to augment the training of a lightweight, plug-and-play OOD detector, thus effectively optimizing the in-distribution (ID)/OOD decision boundaries. Extensive experiments across multiple benchmarks demonstrate that SyncOOD significantly outperforms existing methods, establishing new state-of-the-art performance with minimal synthetic data usage.

Key Contributions

  • We investigate and unlock the potential of text-to-image generative models trained on large-scale open-set data for synthesizing OOD objects in object detection tasks.
  • We introduce an automated data curation process for obtaining controllable, annotated scene-level synthetic OOD images for OOD object detection, which utilizes LLMs for novel concept discovery and visual foundation models for data annotation and filtering.
  • We discover that maintaining ID/OOD image context consistency and obtaining more accurate OOD annotation bounding boxes are crucial for synthesized data to be effective in OOD object detection.
  • Comprehensive experiments on multiple benchmarks demonstrate the effectiveness of our method, as we significantly outperform existing state-of-the-art approaches while using minimal synthetic data.

Citation

If you find this work is useful, please consider citing:

@InProceedings{liu2024can,
	author    = {Liu, Jiahui and Wen, Xin and Zhao, Shizhen and Chen, Yingxian and Qi, Xiaojuan},
	title     = {Can OOD Object Detectors Learn from Foundation Models?},
	booktitle = {European Conference on Computer Vision},
	year      = {2024}
}

Acknowledgements


Synthesize Novel Samples

We aim to develop an automatic, transparent, controllable, and low-cost pipeline for synthesizing scene-level images containing novel objects and provide coco-format annotations to help 1) training OOD detectors and 2) exploring more general open-world tasks (comming soon).


Train an OOD Detector

We utilize synthetic Out-of-Distribution(OOD) samples and original In-Distribution(ID) samples to train a lightweight, plug-and-play OOD detector in a very efficient way, achieving state-of-the-art OOD object detection.
We mainly conduct the experiments on Ubuntu 20.04 with GeForce RTX 3090 GPUs (comming soon).

Releases

No releases published

Packages

No packages published

Languages