Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Fix minor mistakes in vocab-embedding extension #2899

Closed
wants to merge 3 commits into from

Conversation

HarshTrivedi
Copy link
Contributor

  1. Make sure num_embeddings is updated after embedding extension.
  2. Actually pass pretrained_file from from_params to Embedding initializer. Don't know why this was missed!

1. Make sure `num_embeddings` is updated after embedding extension.
2. Actually pass `pretrained_file` from `from_params` to `Embedding` initializer. Don't know why this was missed!
@schmmd schmmd added this to the 0.8.4 milestone May 29, 2019
@joelgrus
Copy link
Contributor

looks fine, if this was secretly broken should we add a test for it?

@schmmd
Copy link
Member

schmmd commented May 30, 2019

@harsh19 I'm merging to unblock cutting a release--but please follow up with a PR with tests.

@HarshTrivedi
Copy link
Contributor Author

@joelgrus sorry, I didn't see the message yesterday and have been busy with something since morning. I will add the tests.

@HarshTrivedi
Copy link
Contributor Author

Making sure _pretrained_file attribute is set is still a problem. The goal of that attribute was to remember what pretrained file was used so that we can use it directly for extension (finetuning/evaluation time). But when archive is loaded, this removes it from params and from_params of Embedding never sees pretrained_file key.

@joelgrus any thoughts on what can be done?

@joelgrus
Copy link
Contributor

I can think of two obvious options:

  1. modify the remove_pretrained_embeddings function to skip Embedding modules
  2. rename that parameter to original_pretrained_file or something like that

@schmmd
Copy link
Member

schmmd commented May 31, 2019

Well shoot--I meant to merge this but instead I updated the branch and never revisited this so it's not in the last release. From the discussion above perhaps that's a good thing though.

@HarshTrivedi
Copy link
Contributor Author

  1. If I skip pretrained_file keys of Embedding in remove_pretrained_embedding_params, I think it defies its purpose. When model archive is loaded, this line can potentially fail if the path to pre-trained embedding file is not available.

  2. I don't understand how this can help. The problem is that params here don't have pretrained_file when archive is loaded.

A combination of both should work though. I can change remove_pretrained_embedding_params function to replace pretrained_file keys of Embedding module to original_pretrained_file. And then change the from_params of Embedding to handle both pretrained_file and original_pretrained_file . Basically, changing this line to

pretrained_file = params.pop("pretrained_file", None) or params.pop("original_pretrained_file", None)

@matt-gardner
Copy link
Contributor

@joelgrus, @HarshTrivedi, what's the status of this PR?

@HarshTrivedi
Copy link
Contributor Author

@matt-gardner the problem isn't solved yet. This will require some more work. I am planning to get back on this after my internship, which is ending in 2 weeks. Hope that's fine.

@schmmd
Copy link
Member

schmmd commented Sep 6, 2019

@harsh19 do you still intend to work on this?

@harsh19
Copy link

harsh19 commented Sep 6, 2019

I think you intended to tag @HarshTrivedi

@schmmd
Copy link
Member

schmmd commented Sep 6, 2019

Oops--sorry. @HarshTrivedi do you still intend to work on this?

@schmmd
Copy link
Member

schmmd commented Dec 13, 2019

Please comment here if you'd like us to reopen this PR.

@schmmd schmmd closed this Dec 13, 2019
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants