Skip to content

pierrelarmande/OryzaGP

Repository files navigation

OryzaGP

A dataset for Named Entity Recognition for rice gene

Citation

Please cite with the following reference:

Updating OryzaGP dataset during BLAH7

The aim of this projet is to :

  • update the datasets with new pubmed entries
  • process annotation on gene/protein entities

Step 1: updating OryzaGP with new pubmed entries

Step 2: creating a new pub dictionnary

  • In order to create or use ER tools, we need to setup a dictionary of gene/protein entities
  • a first file named pub_dictionnary.txt was created from the Oryzabase gene dataset
  • a second pub_dictionnary_with_rapdb_URI.txt was created from the same dataset
    • it contains a label/gene name/symbol/synonyms [TAB] RAP-DB database URI
  • a third pub_dictionnary_with_msu.txt was created from the same dataset
    • it contains a label/gene name/symbol/synonyms [TAB] MSU database URI

Step 3: creating PubDictionary Annotators

  • we created 2 annotators for each pub dictionary ( single and batch mode)

About

A dataset for Named Entity Recognition for rice gene

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published