some questions for code #2

ShellingFord221 · 2021-03-28T08:25:32Z

Hi, thanks for the excellent implementation of pytorch version GAN-BERT! I have 3 questions for the code:

Generator in GAN-BERT is a MLP, in model.py it makes the following transformation of noise's dimension: 100 -> 512 -> 512, why just only 100 -> 512?
Why should we import parameters of TensorFlow version in def convert_to_tf_param_name and def get_weights_from_tf of model.py?
In general GAN, we often train discriminator for several times then train generator once in one epoch. But in ganbert.py, it seems you train discriminator only once then train generator once. Will this operation cause the problem mentioned in issue#1?

Thanks!

The text was updated successfully, but these errors were encountered:

OsmanMutlu · 2021-03-28T16:31:22Z

Hi,

Thank you for your kind words. To answer your questions:

In the default version of Generator1 in my implementation, layers in the MLP are: 100 -> 512 -> 512, not 100 -> 512 (each arrow represents one layer of the MLP.). As you see I add noise_size (or input_size) to the hidden_sizes array, then go over each consequent pair of this array to create a linear layer for each of them. And finally, add a final linear layer going from the last element of this array to output_size. This implementation of MLP is to enable a generic way to create MLP's with any number of hidden layers.
The general practice when you are replicating a paper is to facilitate transferring weights from the original implementation in order to see if this new implementation actually works.
Yes, this can be the thing causing the problem mentioned in issue#1. I'm not really familiar with GAN training tricks yet. If you can point me to some source for these kinds of tricks, I would appreciate it :)

ShellingFord221 · 2021-03-29T08:09:46Z

Hi, thanks for your quick reply! I'm still curious about ...

Why is output_size 512? The dimension of BERT's output embedding is usually 768, or multiples of 768. Since h_fake (generated by generator) and h_real (outputted by GAN) are both feed into discriminator (i.e. D in Figure 1 of the paper), should they share the same size?
Why not use pytorch version of BERT instead? For example, from transformers import BertPreTrainedModel, BertModel ...
Yes, I'd like to! Improved Techniques for Training GANs by Goodfellow is a good recommendation, and its repo can be found here.

Looking forward to your reply! ;)

OsmanMutlu · 2021-03-29T11:49:15Z

Thanks for the link! To answer your questions:

I actually use 768 in the code. I guess 512 was left there from one of my earlier implementations.
This is actually a fork of a very early version of the transformers repo. Repo used to be named "pytorch-pretrained-BERT". I adapted to BERT relatively early, and don't want to change my codebase (My fork has some small additions that I use for other stuff) unless absolutely necessary.
Thank you again.

Provide feedback