Visual Feature representation

Name: Each feature file is named as [File ID].npy which corresponds to the file ID in Flickr30K Entities.
Proposal generation: We use Selective Search to generate proposals for each image in Flickr30K Entities. For Referit Game dataset, we use Edge Box to generate proposals for each image. We select top 100 proposals in each image.
Feature extractor: We apply a Faster-RCNN network pre-trained on PASCAL VOC 2012 for Flickr30K Entities and pre-trained on ImageNet for Referit Game. To extract visual features, we fine-tune the two Faster-RCNN networks on each dataset. The visual feature for each image in these two datasets is represented as a 100 x 4096 matrix. Each row corresponds to visual feature (fc7 layer of Faster-RCNN) in each proposal bounding box.

Fine-tuned visual features Download

Flickr30K Entities

Proposals generated by Selective Search: link (Google drive, zip file of 27MB, 126MB after unzipping). Note: Proposals generated by Selective Search are in the form of [ymin, xmin, ymax, xmax].
Fine-tuned visual features: link (Google drive, zip file of 18GB, 98GB after unzipping)

Referit Game

Proposals generated by Edge Box: link (Google drive, zip file of 19MB, 82MB after unzipping).
Fine-tuned visual features: link (Google drive, zip file of 12GB, 62GB after unzipping)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Visual Feature representation

Fine-tuned visual features Download

Flickr30K Entities

Referit Game

Files

README.md

Latest commit

History

README.md

File metadata and controls

Visual Feature representation

Fine-tuned visual features Download

Flickr30K Entities

Referit Game