Skip to content

Multimodal model based on GraphDoc paper adapted to work on Visual Question Answering. (working in progress...)

Notifications You must be signed in to change notification settings

RiccardoRiccio/GraphDoc-VQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GraphDoc

The source code for Multimodal Pre-training Based on Graph Attention Network for Document Understanding.

Requirements

  • torch==1.7.1
  • mmdet==2.16.0
  • transformers==4.6.0

Pretrained-Model

We provide pretrained model required for downstream tasks, the download link is https://rec.ustc.edu.cn/share/031c0580-0366-11ed-bb15-47281881a56b. The user should unzip the pretrained model to the base folder, formed as graphdoc/pretrained_model.

Usage

We provide a example code for extracting document representation with GraphDoc in the runner/graphdoc/encode_document.py

Citation

If you find GraphDoc useful in your research, please consider citing:

@article{zrzhang2022graphdoc,
    author = {Zhang, Zhenrong and Ma, Jiefeng and Du, Jun and Wang, Licheng and Zhang, Jianshu},
    title = {Multimodal Pre-training Based on Graph Attention Network for Document Understanding},
    journal = {arXiv},
    year = {2022},
    volume={abs/2203.13530}
}

Contact

zzr666@mail.ustc.edu.cn

About

Multimodal model based on GraphDoc paper adapted to work on Visual Question Answering. (working in progress...)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages