Skip to content

Student Network

Thanos Masouris edited this page Jul 21, 2022 · 6 revisions

DiStyleGAN

Overview of DiStyleGAN's architecture

Generator

Initially, the class condition (one-hot encoding) is projected to 128 dimensions, using a Fully Connected layer. Subsequently, the condition embedding, along with a random noise vector of 512 dimensions are concatenated and passed through another Fully Connected layer, which is followed by 3 consecutive Upsampling blocks. Each upsampling block consists of an upsample layer (scale_factor=2, mode='nearest'), a 3x3 convolution with padding, a Batch Normalization layer, and a Gated Linear Unit (GLU). Next, there are 3 residual blocks [1] and a final convolution, which produces the fake image.

Discriminator

DiStyleGAN's discriminator consists of 4 consecutive Downsampling blocks (4x4 strided-convolution, Spectral Normalization, and a LeakyReLU), with each of them reducing the spatial size of the input image by a factor of 2. Subsequently, the logit is concatenated with the class condition embedding and passed through two convolutions to produce the class-conditional discriminator loss.

The initial four downsampling blocks of the Discriminator are the ones producing the feature maps used in the objective function for the Feature Loss.

References

[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.