Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

some questions for code #2

Open
ShellingFord221 opened this issue Mar 28, 2021 · 3 comments
Open

some questions for code #2

ShellingFord221 opened this issue Mar 28, 2021 · 3 comments

Comments

@ShellingFord221
Copy link

ShellingFord221 commented Mar 28, 2021

Hi, thanks for the excellent implementation of pytorch version GAN-BERT! I have 3 questions for the code:

  1. Generator in GAN-BERT is a MLP, in model.py it makes the following transformation of noise's dimension: 100 -> 512 -> 512, why just only 100 -> 512?
  2. Why should we import parameters of TensorFlow version in def convert_to_tf_param_name and def get_weights_from_tf of model.py?
  3. In general GAN, we often train discriminator for several times then train generator once in one epoch. But in ganbert.py, it seems you train discriminator only once then train generator once. Will this operation cause the problem mentioned in issue#1?

Thanks!

@OsmanMutlu
Copy link
Owner

Hi,

Thank you for your kind words. To answer your questions:

  1. In the default version of Generator1 in my implementation, layers in the MLP are: 100 -> 512 -> 512, not 100 -> 512 (each arrow represents one layer of the MLP.). As you see I add noise_size (or input_size) to the hidden_sizes array, then go over each consequent pair of this array to create a linear layer for each of them. And finally, add a final linear layer going from the last element of this array to output_size. This implementation of MLP is to enable a generic way to create MLP's with any number of hidden layers.
  2. The general practice when you are replicating a paper is to facilitate transferring weights from the original implementation in order to see if this new implementation actually works.
  3. Yes, this can be the thing causing the problem mentioned in issue#1. I'm not really familiar with GAN training tricks yet. If you can point me to some source for these kinds of tricks, I would appreciate it :)

@ShellingFord221
Copy link
Author

Hi, thanks for your quick reply! I'm still curious about ...

  1. Why is output_size 512? The dimension of BERT's output embedding is usually 768, or multiples of 768. Since h_fake (generated by generator) and h_real (outputted by GAN) are both feed into discriminator (i.e. D in Figure 1 of the paper), should they share the same size?
  2. Why not use pytorch version of BERT instead? For example, from transformers import BertPreTrainedModel, BertModel ...
  3. Yes, I'd like to! Improved Techniques for Training GANs by Goodfellow is a good recommendation, and its repo can be found here.

Looking forward to your reply! ;)

@OsmanMutlu
Copy link
Owner

Thanks for the link! To answer your questions:

  1. I actually use 768 in the code. I guess 512 was left there from one of my earlier implementations.
  2. This is actually a fork of a very early version of the transformers repo. Repo used to be named "pytorch-pretrained-BERT". I adapted to BERT relatively early, and don't want to change my codebase (My fork has some small additions that I use for other stuff) unless absolutely necessary.
  3. Thank you again.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants