Each feature file is named as [File ID].pkl
which corresponds to the file ID in Flickr30K Entities.
Each file is a dictionary containing information for query features and annotations. Each domain's name and content is listed below:
gt_pos_all
: It is a list of length N
. N
represents the number of queries. The i
-th element of this list is also a list, recording the i
-th query's positive proposals' IDs among the 100
proposals generated by Selective Search or Edge Box. The positive proposals are defined as the proposals with an Intersection of Union (IoU) larger than 0.5
for the corresponding ground truth bounding box of the i
-th query.
pos_id
: It is an N
dimensional vector. The i
-th element represent the proposal ID which covers most with ground truth bounding box for the i
-th query. If the most covered proposal's IoU is less than 0.5
, we replace the proposal ID as -1
.
sens
: It is a list of length N
. The i
-th element of this list is also a list, which represents the word ID sequence of the i
-th query.
gt_box
: It is an N x 4
matrix. The i
-th row represents the ground truth bounding box annotation for the i
-th query. The annotation is in the form of [xmin, ymin, xmax, ymax]
.