We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Sachin Kumar & Yulia Tsvetkov
Carnegie Mellon University
https://arxiv.org/abs/1812.04616
Sequence-to-sequence model can also be trained by directly computing loss on the continuous outputs.
This means the model does not use softmax, thus can use an infinity vocabulary.
Using von Mises-Fisher loss with two kinds of regularization to encourage improving cosine distance.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Main Authors / Organization
Sachin Kumar & Yulia Tsvetkov
Carnegie Mellon University
PDF link
https://arxiv.org/abs/1812.04616
Hypothesis
Sequence-to-sequence model can also be trained by directly computing loss on the continuous outputs.
This means the model does not use softmax, thus can use an infinity vocabulary.
Approach
Using von Mises-Fisher loss with two kinds of regularization to encourage improving cosine distance.
Main Experimental Result
Notes
The text was updated successfully, but these errors were encountered: