Skip to content

v4.19.0: OPT, FLAVA, YOLOS, RegNet, TAPEX, Data2Vec vision, FSDP integration

Compare
Choose a tag to compare
@LysandreJik LysandreJik released this 12 May 14:57
· 7131 commits to main since this release

Disclaimer: this release is the first release with no Python 3.6 support.

OPT

The OPT model was proposed in Open Pre-trained Transformer Language Models by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.

FLAVA

The FLAVA model was proposed in FLAVA: A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

YOLOS

The YOLOS model was proposed in You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain Vision Transformer (ViT) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.

RegNet

The RegNet model was proposed in Designing Network Design Spaces by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.

TAPEX

The TAPEX model was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.

Data2Vec: vision

The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

The vision model is added in v4.19.0.

FSDP integration in Trainer

PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.

It enables Distributed Training at Scale. It's a wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage 3 from DeepSpeed.
PyTorch FSDP will focus more on production readiness and long-term support. This includes better integration with ecosystems and improvements on performance, usability, reliability, debuggability and composability.

Training scripts

New example scripts were added for image classification and semantic segmentation. Both now have versions that leverage the Trainer API and Accelerate.

Documentation in Spanish

To continue democratizing good machine learning, we're making the Transformers documentation more accessible to non-English speakers; starting with Spanish (572M speakers worldwide).

Improvements and bugfixes

Significant community contributions

The following contributors have made significant changes to the library over the last release: