Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

wrong dimension of bert-base-italian-xxl vocabularies #7

Closed
f-wole opened this issue Mar 13, 2020 · 4 comments
Closed

wrong dimension of bert-base-italian-xxl vocabularies #7

f-wole opened this issue Mar 13, 2020 · 4 comments

Comments

@f-wole
Copy link

f-wole commented Mar 13, 2020

Hi, thanks again for these models! I was trying to use the bert-base-italian-xxl models, but I noticed that there is a discrepancy between the vocabulary size in the config.json file (32102) and the actual size of the vocabulary (31102). Is it possible that the wrong vocabulary is uploaded?

@stefan-it
Copy link
Collaborator

Hi @f-wole

thanks for that hint! Vocab file is correct, but in the config file there's a wrong vocab size. I'll fix that now :)

@stefan-it
Copy link
Collaborator

Update on that: unfortunately, I used the vocab size value of 32102 in the configuration for training the model. In order to change fix I would need to re-train the model, which is currently out of my resources.

However, the model is working and I also did all evaluations with the configuration that is deployed on the model hub.

@f-wole
Copy link
Author

f-wole commented Mar 13, 2020

Yes, I saw that the model expects a vocabulary size of 32102 from the dimension of word_embeddings matrix:
embeddings.word_embeddings.weight torch.Size([32102, 768])

So are you suggesting it would be possible to use bert-base-italian-xxl with a vocabulary of size 31102?

@stefan-it
Copy link
Collaborator

It is possible, I did evaluations with the NER example script in Hugging Face Transformers library for NER and PoS tagging.

I just updated the README to mention the vocab and config size mismatch :)

Thanks again for finding this!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants