Data and supplementary materials for the Web Conference 2021 paper "Generating Accurate Caption Units For Figure Captioning".
While the paper expects to appear in dl.acm.org, currently the final camera ready version is available here.
In our view, figure captioning is a visionary problem. Our work on this line is a proof-of-concept of ML capability, based on synthetic figure question answering data availability.
- Data
- Model Hyperparameters and Other Design Choices
- Aggregated Perfect Accuracy
- Intellectual Property Note
- Contacts
This directory contains three parts of material.
- Dataset (
DVQA-cap
andFigureQA-cap
): groundtruth captions for modeling. Converted fromDVQA
andFigureQA
datasets. Includes the full splittrain
,val
,test_easy
andtest_hard
. Due to size limit, we provide Google drive link to thetrain
split. All splits follow the same schema below. quality-validation.xlsx
: a spreadsheet of quality validatio results. Two co-authors did quality validation on a sample of captions from thetest_hard
split ofcaptions.json
files, in two dimensions: accuracy and grammar. The sample covers 20 random captions for each type in each dataset.user-study-12-figures.html
(along with the directory ofuser-study-png-output
): the 12 figures for the Google form user study.aggregated-perfect-accuracy
: Calculation of perfect accuracy scores as additional results from Table 3 and 5.
In each split subdirectory, the file captions.json
contains groundtruth captions and figure metadata that follow our problem formulation, for modeling.
Please download figure images from the original repository of DVQA and FigureQA dataset, for figure consistency.
Below code can be used to read JSON objects from captions.json
files to understand its schema.
import json
print("Loading the validation set of converted captions, in JSON, e.g. from DVQA")
jobject = json.load(open("FigureQA/captions.json", "r"))
print("In this JSON object, the keys are ", jobject.keys())
print()
print("Total caption count in this validation split", len(jobject['captions']))
print()
caption_types = set([item['caption_template_fine_grained'] for item in jobject['captions']])
print("Unique caption types (slight naming difference to Table 1 definition, e.g. horizontal-vertical means figure type)", caption_types)
print()
print("The metadata for one random figure looks like below (has dynamic dictionary, and bounding box positions)")
print(jobject['metadata'][list(jobject['metadata'].keys())[0]])
print()
Thank you for readers interested about source implementation. We unfortunately cannot share here due to company policy.
Please email questions to Xin Qian (xinq@umd.edu). Thank you!