- Download
imagenet_captions.zip
from https://github.com/mlfoundations/imagenet-captions and unzip it to obtain theimagenet_captions.json
. - Run preprocess_imagenet_captions.py. This will create a dataframe at this folder and a dictionary containing WNID of each image at imagenet-captions/processed/labels.
- Download ILSVRC2012 training images from https://www.image-net.org/download.php and place them under ilsvrc2012/ILSVRC2012_img_train. This is a temporary place.
- Run drop_imagenet_examples_wo_caption.py. This will copy ILSVRC training images which are present in ImageNet-Captions to ilsvrc2012/ILSVRC2012_img_train_selected. You can now remove the images left in ilsvrc2012/ILSVRC2012_img_train.