GitHub - edchengg/GigaBERT: EMNLP 2020 GigaBERT Arabic Relation extraction system, named entity recognition, IE

GigaBERT

This repo contains code and data for downstream tasks in GigaBERT:

@inproceedings{lan2020gigabert,
  author     = {Lan, Wuwei and Chen, Yang and Xu, Wei and Ritter, Alan},
  title      = {Giga{BERT}: Zero-shot Transfer Learning from {E}nglish to {A}rabic},
  booktitle  = {Proceedings of The 2020 Conference on Empirical Methods on Natural Language Processing (EMNLP)},
  year       = {2020}
}

Huggingface Supported

Arabic Relation Extraction

re_model = AutoModelForSequenceClassification.from_pretrained("ychenNLP/arabic-relation-extraction")
re_tokenizer = AutoTokenizer.from_pretrained("ychenNLP/arabic-relation-extraction")

Arabic NER

ner_model = AutoModelForTokenClassification.from_pretrained("ychenNLP/arabic-ner-ace")
ner_tokenizer = AutoTokenizer.from_pretrained("ychenNLP/arabic-ner-ace")

Arabic Relation Extraction Pipeline

NER --> Relation Extraction
check relation_extraction/re_pipeline.py for processing functions

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer, AutoModelForSequenceClassification
ner_model = AutoModelForTokenClassification.from_pretrained("ychenNLP/arabic-ner-ace")
ner_tokenizer = AutoTokenizer.from_pretrained("ychenNLP/arabic-ner-ace")
ner_pip = pipeline("ner", model=ner_model, tokenizer=ner_tokenizer, grouped_entities=True)

re_model = AutoModelForSequenceClassification.from_pretrained("ychenNLP/arabic-relation-extraction")
re_tokenizer = AutoTokenizer.from_pretrained("ychenNLP/arabic-relation-extraction")
re_pip = pipeline("text-classification", model=re_model, tokenizer=re_tokenizer)

text_input = """ويتزامن ذلك مع اجتماع بايدن مع قادة الدول الأعضاء في الناتو في قمة موسعة في العاصمة الإسبانية، مدريد."""

ner_output = ner_pip(text_input) # inference NER tags

re_input = process_ner_output(ner_output, text_input) # prepare a pair of entity and predict relation type

re_output = []
for idx in range(len(re_input)):
    tmp_re_output = re_pip(re_input[idx]["re_input"]) # for each pair of entity, predict relation
    re_output.append(tmp_re_output[0])

re_ner_output = post_process_re_output(re_output, text_input, ner_output) # post process NER and relation predictions
print("Sentence: ",re_ner_output["input"])
print('====Entity====')
for ent in re_ner_output["entity"]:
  print('{}--{}'.format(ent["word"], ent["entity_group"]))
print('====Relation====')
for rel in re_ner_output["relation"]:
  print('{}--{}:{}'.format(rel['arg1']['word'], rel['arg2']['word'], rel['relation_type']['label']))
  
'''
Sentence:  ويتزامن ذلك مع اجتماع بايدن مع قادة الدول الأعضاء في الناتو في قمة موسعة في العاصمة الإسبانية، مدريد.
====Entity====
بايدن--PER
قادة--PER
الدول--GPE
الناتو--ORG
العاصمة--GPE
الاسبانية--GPE
مدريد--GPE
====Relation====
قادة--الدول:ORG-AFF
الدول--الناتو:ORG-AFF
العاصمة--الاسبانية:PART-WHOLE
'''

Funding Acknowledgment

This material is based in part on research sponsored by IARPA via the BETTER program (2019-19051600004).

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
ACE05-AR-splits		ACE05-AR-splits
arl_and_re		arl_and_re
nner		nner
pos		pos
relation_extraction		relation_extraction
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GigaBERT

Huggingface Supported

Arabic Relation Extraction Pipeline

Funding Acknowledgment

About

Releases

Packages

Languages

License

edchengg/GigaBERT

Folders and files

Latest commit

History

Repository files navigation

GigaBERT

Huggingface Supported

Arabic Relation Extraction Pipeline

Funding Acknowledgment

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages