TriggerNER

Code & Data for ACL 2020 paper:

TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition

Authors: Bill Yuchen Lin*, Dong-Ho Lee*, Ming Shen, Ryan Moreno, Xiao Huang, Prashant Shiralkar, Xiang Ren

We introduce entity triggers, an effective proxy of human explanations for facilitating label-efficient learning of NER models. We crowd-sourced 14k entity triggers for two well-studied NER datasets. Our proposed model, name Trigger Matching Network, jointly learns trigger representations and soft matching module with self-attention such that can generalize to unseen sentences easily for tagging. Expriments show that the framework is significantly more cost-effective such that usinng 20% of the trigger-annotated sentences can result in a comparable performance of conventional supervised approaches using 70% training data.

If you make use of this code or the entity triggers in your work, please kindly cite the following paper:

@inproceedings{TriggerNER2020,
  title={TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition},
  author={Bill Yuchen Lin and Dong-Ho Lee and Ming Shen and Ryan Moreno and Xiao Huang  and Prashant Shiralkar and Xiang Ren}, 
  booktitle={Proceedings of ACL},
  year={2020}
}

Trigger Dataset

The concept of entity triggers, a novel form of explanatory annotation for named entity recognition problems.
We crowd-source and publicly release 14k annotated entity triggers on two popular datasets: CoNLL03 (generic domain), BC5CDR (biomedical domain).

dataset/ saves CONLL, BC5CDR, and Laptop-Reviews dataset. For each directory,

train.txt, test.txt, dev.txt are original dataset
train_20.txt is for cutting out the original train dataset into 20% for baseline setting. The dataset is used in naive.py
trigger_20.txt is trigger dataset. The dataset is used in supervised.py and semi_supervised.py.

To enable 3% of original training dataset, you should use --percentage 15 since the dataset we used for supervised.py and semi_supervised.py is 20% of original training data with triggers.

Requirements

Python >= 3.6 and PyTorch >= 0.4.1

python -m pip install -r requirements.txt

Train and Test

Train/Test Baseline (Bi-LSTM / CRF with 20 % of training dataset) :

python naive.py --dataset CONLL
python naive.py --dataset BC5CDR

Train/Test Trigger Matching Network in supervised setting :

python supervised.py --dataset CONLL
python supervised.py --dataset BC5CDR

Train/Test Trigger Matching Network in semi-supervised setting (self-training) :

python semi_supervised.py --dataset CONLL
python semi_supervised.py --dataset BC5CDR

Our code is based on https://github.com/allanj/pytorch_lstmcrf.

INK Lab at USC

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
common		common
config		config
dataset		dataset
figure		figure
model		model
.gitignore		.gitignore
README.md		README.md
naive.py		naive.py
requirments.txt		requirments.txt
semi_supervised.py		semi_supervised.py
supervised.py		supervised.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TriggerNER

Quick Links

Trigger Dataset

Requirements

Train and Test

About

Releases

Packages

Contributors 3

Languages

INK-USC/TriggerNER

Folders and files

Latest commit

History

Repository files navigation

TriggerNER

Quick Links

Trigger Dataset

Requirements

Train and Test

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages