IPGPF

Code for EMNLP 2023 long paper: An Iteratively Parallel Generation Method with the Pre-Filling Strategy for Document-level Event Extraction

Data Preparation

ChFinAnn Link: (https://github.com/dolphin-zs/Doc2EDAG/blob/master/Data.zip)

DuEE-fin Link: (https://aistudio.baidu.com/aistudio/competition/detail/46)

Download the data from the link above.

cd IPGPF
mkdir ./Data

For ChFinAnn data, unzip it to the ./Data dictionary.
For DuEE-fin data, unzip it to the ./Data dictionary.
Preprocess the DuEE-fin data to the same format as ChFinAnn data:

cd dee
python3 build_duee_data.py

Training

# For a machine with 8 GPUs
# ChFinAnn dataset
$ bash train_chfinann.sh 8
# DuEE-fin dataset
$ bash train_duee.sh 8

Before that, make sure that you have set the correct dataset flag at line 369 of dee/event_type.py:

dataset = ['ChFinAnn', 'DuEE-fin'][1]

Regarding the control variable settings,

# ChFinAnn dataset
$ bash train_chfinann_CV.sh 8
# DuEE-fin dataset
$ bash train_duee_CV.sh 8

Before that, make sure that you have set the correct dataset flag at line 369 and entity merge flag at line 370 of dee/event_type.py.

dataset = ['ChFinAnn', 'DuEE-fin'][1]
merge_entity = [True, False][1]

Please note that

By setting a large step length of gradient accumulation, we can achieve large batch training with a few common GPUs. Specifically, for Tesla V100 (32GB Memory), you should maintain B/(N*G) == 1, where B, N and G denote the batch size, the number of GPUs, and the step size of gradient accumulation, respectively.

Inference

To get inference results for DuEE-fin dataset, choose the best checkpoint on dev set and run

$ bash duee_inference.sh

Citation

If you find our work useful to your research, you can cite the paper below:

@inproceedings{huang2023iteratively,
  title={An iteratively parallel generation method with the pre-filling strategy for document-level event extraction},
  author={Huang, Guanhua and Xu, Runxin and Zeng, Ying and Chen, Jiaze and Yang, Zhouwang and Weinan, E},
  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
  pages={10834--10852},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IPGPF

Data Preparation

Training

Inference

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dee		dee
tools		tools
README.md		README.md
duee_inference.sh		duee_inference.sh
run_dee_task.py		run_dee_task.py
train_chfinann.sh		train_chfinann.sh
train_chfinann_CV.sh		train_chfinann_CV.sh
train_duee.sh		train_duee.sh
train_duee_CV.sh		train_duee_CV.sh

CarlanLark/IPGPF

Folders and files

Latest commit

History

Repository files navigation

IPGPF

Data Preparation

Training

Inference

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages