You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi everyone. I would like to make some inferences and replicate the reported BLEU Score for the English to Korean Translation model (https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-kor).
I downloaded the files there and installed marian-nmt on Ubuntu 20.04.3, including protobuf to use Sentencepiece as required in https://marian-nmt.github.io/docs/ .
I ran the preprocess.sh, then with its output ran marian-decoder to get the translations, and finally ran the postprocess.sh.
The results were unexpected, in fact there where no Korean characters at all.
Am I doing something wrong?
The text was updated successfully, but these errors were encountered:
Ah, I forgot to include the vocab files that are mentioned in the decoder.yml. Thanks for pointing me into that direction. The *.vocab.yml file is not the correct one here as this model comes with separate vocabularies for source and target language. Look into the decoder.yml file to see that this is the case. But the *.vocab files mentioned there are missing. However, you can use the spm-files directly. Edit the decoder.yml file to look like this:
Hi everyone. I would like to make some inferences and replicate the reported BLEU Score for the English to Korean Translation model (https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-kor).
I downloaded the files there and installed marian-nmt on Ubuntu 20.04.3, including protobuf to use Sentencepiece as required in https://marian-nmt.github.io/docs/ .
I ran the preprocess.sh, then with its output ran marian-decoder to get the translations, and finally ran the postprocess.sh.
The results were unexpected, in fact there where no Korean characters at all.
Am I doing something wrong?
The text was updated successfully, but these errors were encountered: