This repository presents PyTorch implementations of various methods to inject additional information, such as time embeddings in diffusion UNet or speaker embeddings in speech synthesis, into your models. Enhance your network's performance and capabilities with these advanced conditioning techniques.
- FiLM Layer: Incorporate the FiLM: Visual Reasoning with a General Conditioning Layer into your models to dynamically influence their behavior based on external information.
- Conditional Layer Norm: Utilize the Conditional Layer Norm strategy from AdaSpeech for adaptive and context-aware normalization.
- Style-Adaptive Layer Normalization: Utilize the Style-Adaptive Layer Normalization from Meta-StyleSpeech for conditioning the normalization process with external data.
- Adaptive Instance Normalization (AdaIN): Incorporate the Adaptive Instance Normalization for fast and flexible style transfer.
import torch
from layers import FiLMLayer
x = torch.randn((16,37,256)) # [batch_size, time, in_channels]
c = torch.randn((16,1,320)) # [batch_size, 1, cond_channels]
model = FiLMLayer(256, 320)
output = model(x, c) # [batch_size, time, in_channels]
import torch
from layers import ConditionalLayerNorm
x = torch.randn((16,37,256)) # [batch_size, time, in_channels]
c = torch.randn((16,1,320)) # [batch_size, 1, cond_channels]
model = ConditionalLayerNorm(256, 320)
output = model(x, c) # [batch_size, time, in_channels]
import torch
from layers import StyleAdaptiveLayerNorm
x = torch.randn((16,37,256)) # [batch_size, time, in_channels]
c = torch.randn((16,1,320)) # [batch_size, 1, cond_channels]
model = StyleAdaptiveLayerNorm(256, 320)
output = model(x, c) # [batch_size, time, in_channels]
import torch
from layers import AdaINLayer
# x and c should have the same shape
x = torch.randn((16,256,37)) # [batch_size, in_channels, time]
c = torch.randn((16,256,37)) # [batch_size, in_channels, time]
model = AdaINLayer(256)
output = model(x, c) # [batch_size, in_channels, time]