HIER - Pytorch

Figure 1: Detailed Architecture for a Hierarchical Transformer Encoder or HT-Encoder: The main inductive bias incorporated in this model is to encode the full dialog context hierarchically in two stages. This is done by the two encoders, 1) Shared Utterance Encoder (M layers) and 2) Context Encoder (N layers), as shown in the figure. Shared encoder first encodes each utterance ( $u_1, u_2, \dots, u_t$ ) individually to extract the utterance level features. The same parameterized Shared Encoder is used for encoding all utterances in the context. In the second Context Encoder the full context is encoded using a single transformer encoder for extracting dialog level features. The attention mask in context encoder decides how the context encoding is done and is a choice of the user. This one depicted in the figure is for the HIER model described in Section 2.3 of paper. Only the final utterance in the Context Encoder gets to attend over all the previous utterances as shown. This allows the model to have access to both utterance level features and dialog level features till the last layer of the encoding process. Notation: Utterance $i$ , $u_i = [w_{i1}, \dots, w_{i|u_i|}]$ , $w_{ij}$ is the word embedding for $j^{th}$ word in $i^{th}$ utterance.

HIER - Pytorch

Implementation of HIER, in Pytorch

Title: Hierarchical Transformer for Task Oriented Dialog Systems. Bishal Santra, Potnuru Anusha and Pawan Goyal (NAACL 2021, Long Paper)

Install

Coming soon...

Usage [`python tests.py`]

import torch
from hier_transformer_pytorch import HIERTransformer, get_hier_encoder_mask

# Model
hier_transformer = HIERTransformer(nhead=16, num_encoder_layers=12, vocab_size=1000)

# Random input
src = torch.randint(0, 1000, (10, 32)).long() # S x N
tgt = torch.randint(0, 1000, (20, 32)).long() # T x N
src_padding_mask = torch.tensor([0, 0, 0, 0, 0, 0, 0, 1, 1, 1]).unsqueeze(0).expand(32, -1)
utt_indices = torch.tensor([0, 0, 1, 1, 1, 2, 2, 3, 3, 3]).unsqueeze(0).expand(32, -1)

# forward
out = hier_transformer.forward(src, tgt, utt_indices=utt_indices, src_key_padding_mask=src_padding_mask)

print(f"src: {src.shape}, tgt: {tgt.shape} -> out: {out.shape}")
# Output: src: torch.Size([10, 32]), tgt: torch.Size([20, 32]) -> out: torch.Size([20, 32, 512])

Designing UT-Mask and CT-Mask(s)

TIPS:

Padding is attended by padding to prevent nan output from softmax function in attention. Otherwise we will have a situation like softmax([-inf, -inf, ..., -inf]) (the undefined 0/0 situation).
There should be no row with all ones, as this would imply that the corresponding token not attending to anything from previous layers. This will cause numerical instability as described above.
Also, UT-Mask and CT-Mask should be designed in a way to make sure that a valid content token isn't attending to padding indices.

Running the Experiments

Coming soon...

Citations

@misc{santra2021hierarchical,
      title={Hierarchical Transformer for Task Oriented Dialog Systems}, 
      author={Bishal Santra and Potnuru Anusha and Pawan Goyal},
      year={2021},
      eprint={2011.08067},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgements

We thank the authors and developers

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
hier_transformer_pytorch		hier_transformer_pytorch
.gitignore		.gitignore
HIER_Encoder-combined.png		HIER_Encoder-combined.png
LICENSE		LICENSE
README.md		README.md
plot_hier.png		plot_hier.png
plot_hier_cls.png		plot_hier_cls.png
plot_hier_utmask_only.png		plot_hier_utmask_only.png
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HIER - Pytorch

Install

Usage [`python tests.py`]

Designing UT-Mask and CT-Mask(s)

Running the Experiments

Citations

Acknowledgements

About

Releases

Packages

Languages

License

bsantraigi/hier-transformer-pytorch

Folders and files

Latest commit

History

Repository files navigation

HIER - Pytorch

Install

Usage [python tests.py]

Designing UT-Mask and CT-Mask(s)

Running the Experiments

Citations

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Usage [`python tests.py`]

Packages