Download the COCO dataset.
Create the annotation jsons for the open-vocabulary setting. We use the scripts provide by OVR-CNN. Then add the object proposals that may cover the novel classes. You can download our pre-generated json file in Google Drive.
Extract the CLIP image features. You can download our pre-generated file in Google Drive, or use the script to extract it by yourself.
Under preparation.