Language Modeling with Recurrent Neural Networks

There are two implemented models (WordLanguageModel, CharCompLanguageModel) based on these two papers:

Recurrent Neural Network Regularization (Zaremba, Vinyals, Sutskever) (2014)
- https://arxiv.org/pdf/1409.2329.pdf
Character-Aware Neural Language Models (Kim, Jernite, Sontag, Rush)
- https://arxiv.org/pdf/1508.06615.pdf

To run the Zaremba model with their "medium regularized LSTM" configuration, early stopping, and pre-trained word vectors:

python trainer.py --config config/ptb-med.json

Status

The "medium regularized LSTM" above (Word Med below) has a lower perplexity than the original paper (even the large model). As noted above, the run above differs in that it uses pre-trained word vectors.

Model	Framework	Dev	Test
Word Med (Zaremba)	TensorFlow	80.168	77.2213

TODO: Add LSTM Char Small Configuration results

Losses and Reporting

The loss that is optimized is the total loss divided by the total number of tokens in the mini-batch (token level loss). This is different than how the loss is calculated in Tensorflow Tutorial but it is how the loss is calculated in awd-lm (Merity et. al, 2017), Elmo (Peters et. al., 2018), OpenAI GPT (Radford et. al., 2018), and BERT (Devlin et. al., 2018)

When reporting the loss every nsteps it is the total loss divided by the total number of tokens in the last nstep number of mini-batches. The perplexity is e to this loss.

The epoch loss is the total loss averaged over the total number of tokens in the whole epoch. The perplexity is e to this loss. This results in token level perplexity which is standard reporting in the literature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lm.md

lm.md

Language Modeling with Recurrent Neural Networks

Status

Losses and Reporting

Files

lm.md

Latest commit

History

lm.md

File metadata and controls

Language Modeling with Recurrent Neural Networks

Status

Losses and Reporting