-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Added PubMed embeddings computed by @jessepeng #519
Conversation
Is the size of the hidden layer(s) and the number of layers known for these models? This would be an interesting information for comparative experiments. |
Hi @khituras - I believe the model was trained with a hidden size of 1150 and 3 layers and BPTT truncated at a sequence length of 240. It was only trained over a 5% sample of PubMed abstracts until 2015, which is 1.219.734 abstracts. @jessepeng is this correct? |
Yes, this is correct. Below are the hyperparameters used for training: |
@jessepeng Thank you so much for this specification. Was there some specific evaluation strategy that lead you to choose these parameters? |
Yes, good point - we'll add this to the documentation with the release! |
Can we know the statistics of test and validation dataset and what is the perplexity on test and validation dataset? |
@khituras No, I chose most of those parameters because they were the standard parameters of Flair. I did however choose the number of layers and number of hidden dimensions to be in accordance to a word-level LM I also trained on the same corpus. The architecture and hyperparameters I chose for this LM follow Merity et. al. 2017. @pinal-patel The dataset consisting of the aforementioned 1.219.734 abstract was split 60/10/30 into train/validation/test datasets. The perplexities on train/val/test were 2,15/2,08/2,07 for the forward model and 2,19/2,1/2,09 for the backward model. |
@jessepeng Did you start the training from scratch on Pubmed abstracts or did you further fine tune on a model trained on Wiki or some similar dataset? |
@jessepeng ? |
@shreyashub I started training from scratch. I trained each direction for about 10 days on a GeForce GTX Titan X. |
Hello @shreyashub to fine tune an existing LanguageModel, you only need to load an existing one instead of instantiating a new one. The rest of the training code remains the same as in Tutorial 9: from flair.data import Dictionary
from flair.models import LanguageModel
from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus
# get your corpus, process forward and at the character level
corpus = TextCorpus('/path/to/your/corpus',
dictionary,
is_forward_lm,
character_level=True)
# instantiate an existing LM, such as one from the FlairEmbeddings
language_model = FlairEmbeddings('news-forward-fast').lm
# use the model trainer to fine-tune this model on your corpus
trainer = LanguageModelTrainer(language_model, corpus)
trainer.train('resources/taggers/language_model',
sequence_length=10,
mini_batch_size=10,
max_epochs=10) Note that when you fine-tune, you automatically use the same character dictionary as before and automatically copy the direction (forward/backward). |
as |
Yes that works - the pooled variant just builds on top of |
@jessepeng computed a character LM over PubMed abstracts and shared the models with us. This PR adds them as FlairEmbeddings.
Init with: