Transfer learning on large scale pre-trained models based on transformation architecture on downstream applications have been gaining a lot of popularity due to its promising gains on model performance. However, in real world applications, often times, we have other constraints around latency, throughout as well as memory. This folder show cases several techniques that are commonly used to speed up transformers model's inferencing while still retaining the majority of its performance.