Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs #3

Open
zomux opened this issue Jan 31, 2019 · 0 comments

Comments

@zomux
Copy link
Contributor

zomux commented Jan 31, 2019

Main Authors / Organization

Sachin Kumar & Yulia Tsvetkov

Carnegie Mellon University

PDF link

https://arxiv.org/abs/1812.04616

Hypothesis

Sequence-to-sequence model can also be trained by directly computing loss on the continuous outputs.

This means the model does not use softmax, thus can use an infinity vocabulary.

Approach

Using von Mises-Fisher loss with two kinds of regularization to encourage improving cosine distance.

Main Experimental Result

  • IWSLT16 datasets
  • on par with LSTM-based BPE->BPE model with both regularization
  • MaxMargin loss is also as good as vMF loss

Notes

rb 2019-01-31 15 52 32
rb 2019-01-31 15 52 43

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

1 participant