von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs #3

zomux · 2019-01-31T21:01:39Z

Main Authors / Organization

Sachin Kumar & Yulia Tsvetkov

Carnegie Mellon University

Sequence-to-sequence model can also be trained by directly computing loss on the continuous outputs.

This means the model does not use softmax, thus can use an infinity vocabulary.

Using von Mises-Fisher loss with two kinds of regularization to encourage improving cosine distance.

zomux added Machine Translation Language Model labels Jan 31, 2019