-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
How to convert models with two vocab files to PyTorch? #22
Comments
This model uses two separate vocabularies and it does not properly convert to pytorch and huggingface at the moment. Hopefully, this will be added soon to the conversion procedures. |
Thanks @jorgtied Thanks |
The latest conversion scripts in the transformer library support the conversion of models with two vocabs. You may also check my recipes in https://github.com/Helsinki-NLP/Opus-MT/tree/master/hf |
I removed that model because it was so poor (at least according to the scores). I should create new ones for this language pair. |
Hi, I still got the same error I used the script from transformers, i.e., I also tried the convert_to_pytorch.py script you suggested, same error. Can you show me the command to convert such two vocab model pytorch? Thanks |
More resources on these split vocab models would be helpful. I'm also trying to compile these to CTranslate2 and having difficulties due to the split vocabs. |
Hi,
I would like to get translation result from the eng-kor model with
transformers.MarianMTModel
andtransformers.MarianTokenizer
. I understand we need to first convert the model to PyTorch format with convert_marian_tatoeba_to_pytorch.py first.The eng-kor has two different vocab sets for encoder and decoder. How can we use
transformers.models.marian.convert_marian_to_pytorch.convert
function to do the conversion?Because there is no
vocab.yml file
in the zip file, I found the line 381 throwsIndexError: list index out of range
error.Thanks
The text was updated successfully, but these errors were encountered: