This is a implementation of the paper RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER.
- Download multi-modal NER dataset Twitter-15 (Zhang et al., 2018) from here to this path.
- Download multi-modal NER dataset Twitter-17 (Zhang et al., 2018) to this path.
- Download text-image relationship dataset (Vempala et al., 2019) from here to this path.
Run loader.py to make sure the statistics is identical as (Zhang et al., 2018) and (Lu et al., 2018).
Twitter-15 | NUM | PER | LOC | ORG | MISC |
---|---|---|---|---|---|
Training | 4000 | 2217 | 2091 | 928 | 940 |
Development | 1000 | 552 | 522 | 247 | 225 |
Testing | 3257 | 1816 | 1697 | 839 | 726 |
Twitter-17 | NUM | TOKEN |
---|---|---|
Training | 4290 | 68655 |
Development | 1432 | 22872 |
Testing | 1459 | 23051 |
- Download pre-trained ResNet-101 weights from here to this path.
- Download pre-trained BERT-Base weights from here to this path.
- Download pre-trained word embeddings from here to this path.
- tqdm
- Pillow
- numpy
- torch
- torchvision
- transformers
- flair
- pytorch-crf
# BERT-BiLSTM-CRF
python main.py --stacked --rnn --crf --dataset [dataset_id] --cuda [gpu_id]
# RpBERT-BiLSTM-CRF
python main.py --stacked --rnn --crf --encoder_v resnet101 --aux --gate --dataset [dataset_id] --cuda [gpu_id]