Release v4.19.0: OPT, FLAVA, YOLOS, RegNet, TAPEX, Data2Vec vision, FSDP integration · huggingface/transformers

Disclaimer: this release is the first release with no Python 3.6 support.

OPT

The OPT model was proposed in Open Pre-trained Transformer Language Models by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.

Add OPT by @younesbelkada in #17088

FLAVA

The FLAVA model was proposed in FLAVA: A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

[feat] Add FLAVA model by @apsdehal in #16654

YOLOS

The YOLOS model was proposed in You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain Vision Transformer (ViT) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.

Add YOLOS by @NielsRogge in #16848

RegNet

The RegNet model was proposed in Designing Network Design Spaces by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.

RegNet by @FrancescoSaverioZuppichini in #16188

TAPEX

The TAPEX model was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.

Add TAPEX by @NielsRogge in #16473

Data2Vec: vision

The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

The vision model is added in v4.19.0.

[Data2Vec] Add data2vec vision by @patrickvonplaten in #16760
Add Data2Vec for Vision in TF by @sayakpaul in #17008

FSDP integration in Trainer

PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.

It enables Distributed Training at Scale. It's a wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage 3 from DeepSpeed.
PyTorch FSDP will focus more on production readiness and long-term support. This includes better integration with ecosystems and improvements on performance, usability, reliability, debuggability and composability.

PyTorch FSDP integration in Trainer by @pacman100 in #17136

Training scripts

New example scripts were added for image classification and semantic segmentation. Both now have versions that leverage the Trainer API and Accelerate.

Add image classification script, no trainer by @NielsRogge in #16727
Add semantic script no trainer, v2 by @NielsRogge in #16788
Add semantic script, trainer by @NielsRogge in #16834

Documentation in Spanish

To continue democratizing good machine learning, we're making the Transformers documentation more accessible to non-English speakers; starting with Spanish (572M speakers worldwide).

Added es version of language_modeling.mdx doc by @jQuinRivero in #17021
Spanish translation of the file philosophy.mdx by @jkmg in #16922
Documentation: Spanish translation of fast_tokenizers.mdx by @jloayza10 in #16882
Translate index.mdx (to ES) and add Spanish models to quicktour.mdx examples by @omarespejel in #16685
Spanish translation of the file multilingual.mdx by @SimplyJuanjo in #16329
Added spanish translation of autoclass_tutorial. by @duedme in #17069
Fix style error in Spanish docs by @osanseviero in #17197

Improvements and bugfixes

[modeling_utils] rearrange text by @stas00 in #16632
Added Annotations for PyTorch models by @anmolsjoshi in #16619
Allow the same config in the auto mapping by @sgugger in #16631
Update no_trainer scripts with new Accelerate functionalities by @muellerzr in #16617
Fix doc example by @NielsRogge in #16448
Add inputs vector to calculate metric method by @lmvasque in #16461
[megatron-bert-uncased-345m] fix conversion by @stas00 in #16639
Remove parent/child tests in auto model tests by @sgugger in #16653
Updated _load_pretrained_model_low_mem to check if keys are in the state_dict by @FrancescoSaverioZuppichini in #16643
Update Support image on README.md by @BritneyMuller in #16615
bert: properly mention deprecation of TF2 conversion script by @stefan-it in #16171
add vit tf doctest with @add_code_sample_docstrings by @johko in #16636
Fix error in doc of DataCollatorWithPadding by @secsilm in #16662
Fix QA sample by @ydshieh in #16648
TF generate refactor - Beam Search by @gante in #16374
Add tests for no_trainer and fix existing examples by @muellerzr in #16656
only load state dict when the checkpoint is not None by @laurahanu in #16673
[Trainer] tf32 arg doc by @stas00 in #16674
Update audio examples with MInDS-14 by @stevhliu in #16633
add a warning in SpmConverter for sentencepiece's model using the byte fallback feature by @SaulLu in #16629
Fix some doc examples in task summary by @ydshieh in #16666
Jia multi gpu eval by @liyongsea in #16428
Generate: min length can't be larger than max length by @gante in #16668
fixed crash when deleting older checkpoint and a file f"{checkpoint_prefix}-*" exist by @sadransh in #16686
[Doctests] Correct task summary by @patrickvonplaten in #16644
Add Doc Test for BERT by @vumichien in #16523
Fix t5 shard on TPU Pods by @agemagician in #16527
update decoder_vocab_size when resizing embeds by @patil-suraj in #16700
Fix TF_MASKED_LM_SAMPLE by @ydshieh in #16698
Rename the method test_torchscript by @ydshieh in #16693
Reduce memory leak in _create_and_check_torchscript by @ydshieh in #16691
Enable more test_torchscript by @ydshieh in #16679
Don't push checkpoints to hub in no_trainer scripts by @muellerzr in #16703
Private repo TrainingArgument by @nbroad1881 in #16707
Handle image_embeds in ViltModel by @ydshieh in #16696
Improve PT/TF equivalence test by @ydshieh in #16557
Fix example logs repeating themselves by @muellerzr in #16669
[Bart] correct doc test by @patrickvonplaten in #16722
Add Doc Test GPT-2 by @ArEnSc in #16439
Only call get_output_embeddings when tie_word_embeddings is set by @smelm in #16667
Update run_translation_no_trainer.py by @raki-1203 in #16652
Qdqbert example add benchmark script with ORT-TRT by @shangz-ai in #16592
Replace assertion with exception by @anmolsjoshi in #16720
Change the chunk_iter function to handle by @Narsil in #16730
Remove duplicate header by @sgugger in #16732
Moved functions to pytorch_utils.py by @anmolsjoshi in #16625
TF: remove set_tensor_by_indices_to_value by @gante in #16729
Add Doc Tests for Reformer PyTorch by @hiromu166 in #16565
[FlaxSpeechEncoderDecoder] Fix input shape bug in weights init by @sanchit-gandhi in #16728
[FlaxWav2Vec2Model] Fix bug in attention mask by @sanchit-gandhi in #16725
add Bigbird ONNX config by @vumichien in #16427
TF generate: handle case without cache in beam search by @gante in #16704
Fix decoding score comparison when using logits processors or warpers by @bryant1410 in #10638
[Doctests] Fix all T5 doc tests by @patrickvonplaten in #16646
Fix #16660 (tokenizers setters of ids of special tokens) by @davidleonfdez in #16661
[from_pretrained] refactor find_mismatched_keys by @stas00 in #16706
Add Doc Test for GPT-J by @ArEnSc in #16507
Fix and improve CTRL doctests by @jeremyadamsfisher in #16573
[modeling_utils] better explanation of ignore keys by @stas00 in #16741
CI: setup-dependent pip cache by @gante in #16751
Reduce Funnel PT/TF diff by @ydshieh in #16744
Add defensive check for config num_labels and id2label by @sgugger in #16709
Add self training code for text classification by @tuvuumass in #16738
[self-scheduled ci] explain where dependencies are by @stas00 in #16757
Fixup no_trainer examples scripts and add more tests by @muellerzr in #16765
[Doctest] added doctest changes for electra by @bhadreshpsavani in #16675
Enabling Tapex in table question answering pipeline. by @Narsil in #16663
[Flax .from_pretrained] Raise a warning if model weights are not in float32 by @sanchit-gandhi in #16762
Fix batch size in evaluation loop by @sgugger in #16763
Make nightly install dev accelerate by @muellerzr in #16783
[deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop by @stas00 in #16717
Kill async pushes when calling push_to_hub with blocking=True by @sgugger in #16755
Improve image classification example by @NielsRogge in #16585
[SpeechEncoderDecoderModel] Fix bug in reshaping labels by @sanchit-gandhi in #16748
Fix issue avoid-missing-comma found at https://codereview.doctor by @code-review-doctor in #16768
[trainer / deepspeed] fix hyperparameter_search by @stas00 in #16740
[modeling utils] revamp from_pretrained(..., low_cpu_mem_usage=True) + tests by @stas00 in #16657
Fix PT TF ViTMAE by @ydshieh in #16766
Update README.md by @NielsRogge in #16797
Pin Jax to last working release by @sgugger in #16808
CI: non-remote GH Actions now use a python venv by @gante in #16789
TF generate refactor - XLA sample by @gante in #16713
Raise error and suggestion when using custom optimizer with Fairscale or Deepspeed by @allanj in #16786
Create empty venv on cache miss by @gante in #16816
[ViT, BEiT, DeiT, DPT] Improve code by @NielsRogge in #16799
[Quicktour Audio] Improve && remove ffmpeg dependency by @patrickvonplaten in #16723
fix megatron bert convert state dict naming by @Codle in #15820
use base_version to check torch version in torch_less_than_1_11 by @nbroad1881 in #16806
Allow passing encoder_ouputs as tuple to EncoderDecoder Models by @jsnfly in #16814
Refactor issues with yaml by @LysandreJik in #16772
fix _setup_devices in case where there is no torch.distributed package in build by @dlwh in #16821
Clean up semantic segmentation tests by @NielsRogge in #16801
Fix LayoutLMv2 tokenization docstrings by @qqaatw in #16187
Wav2 vec2 phoneme ctc tokenizer optimisation by @ArthurZucker in #16817
[Flax] improve large model init and loading by @patil-suraj in #16148
Some tests misusing assertTrue for comparisons fix by @code-review-doctor in #16771
Type hints added for TFMobileBert by @Dahlbomii in #16505
fix rum_clm.py seeking text column name twice by @dandelin in #16624
Add onnx export of models with a multiple choice classification head by @echarlaix in #16758
[ASR Pipeline] Correct init docs by @patrickvonplaten in #16833
Add doc about attention_mask on gpt2 by @wiio12 in #16829
TF: Add sigmoid activation function by @gante in #16819
Correct Logging of Eval metric to Tensorboard by @Jeevesh8 in #16825
replace Speech2TextTokenizer by Speech2TextFeatureExtractor in some docstrings by @SaulLu in #16835
Type hints added to Speech to Text by @Dahlbomii in #16506
Improve test_pt_tf_model_equivalence on PT side by @ydshieh in #16731
Add support for bitsandbytes by @manuelciosici in #15622
[Typo] Fix typo in modeling utils by @patrickvonplaten in #16840
add DebertaV2 fast tokenizer by @mingboiz in #15529
Fixing return type tensor with num_return_sequences>1. by @Narsil in #16828
[modeling_utils] use less cpu memory with sharded checkpoint loading by @stas00 in #16844
[docs] fix url by @stas00 in #16860
Fix custom init sorting script by @sgugger in #16864
Fix multiproc metrics in no_trainer examples by @muellerzr in #16865
Long QuestionAnsweringPipeline fix. by @Narsil in #16778
t5: add conversion script for T5X to FLAX by @stefan-it in #16853
tiny tweak to allow BatchEncoding.token_to_char when token doesn't correspond to chars by @ghlai9665 in #15901
Adding support for array key in raw dictionnaries in ASR pipeline. by @Narsil in #16827
Return input_ids in ImageGPT feature extractor by @sgugger in #16872
Use ACT2FN to fetch ReLU activation by @eldarkurtic in #16874
Fix GPT-J onnx conversion by @chainyo in #16780
Fix doctest list by @ydshieh in #16878
New features for CodeParrot training script by @loubnabnl in #16851
Add missing entries in mappings by @ydshieh in #16857
TF: rework XLA generate tests by @gante in #16866
Minor fixes/improvements in convert_file_size_to_int by @mariosasko in #16891
Add doc tests for Albert and Bigbird by @vumichien in #16774
Add OnnxConfig for ConvBERT by @chainyo in #16859
TF: XLA repetition penalty by @gante in #16879
Changes in create_optimizer to support tensor parallelism with SMP by @cavdard in #16880
[DocTests] Fix some doc tests by @patrickvonplaten in #16889
add bigbird typo fixes by @chainyo in #16897
Fix doc test quicktour dataset by @patrickvonplaten in #16929
Add missing ckpt in config docs by @ydshieh in #16900
Fix PyTorch RAG tests GPU OOM by @ydshieh in #16881
Fix RemBertTokenizerFast by @ydshieh in #16933
TF: XLA logits processors - minimum length, forced eos, and forced bos by @gante in #16912
TF: XLA Logits Warpers by @gante in #16899
added deit onnx config by @rushic24 in #16887
TF: XLA stable softmax by @gante in #16892
Replace deprecated logger.warn with warning by @sanchit-gandhi in #16876
Fix issue probably-meant-fstring found at https://codereview.doctor by @code-review-doctor in #16913
Limit the use of PreTrainedModel.device by @sgugger in #16935
apply torch int div to layoutlmv2 by @ManuelFay in #15457
FIx Iterations for decoder by @agemagician in #16934
Add onnx config for RoFormer by @skrsna in #16861
documentation: some minor clean up by @mingboiz in #16850
Fix RuntimeError message format by @ftnext in #16906
use original loaded keys to find mismatched keys by @tricktreat in #16920
[Research] Speed up evaluation for XTREME-S by @anton-l in #16785
Fix HubertRobustTest PT/TF equivalence test on GPU by @ydshieh in #16943
Misc. fixes for Pytorch QA examples: by @searchivarius in #16958
[HF Argparser] Fix parsing of optional boolean arguments by @NielsRogge in #16946
Fix distributed_concat with scalar tensor by @Yard1 in #16963
Update custom_models.mdx by @mishig25 in #16964
Fix add-new-model-like when model doesn't support all frameworks by @sgugger in #16966
Fix multiple deletions of the same files in save_pretrained by @sgugger in #16947
Fixup no_trainer save logic by @muellerzr in #16968
Fix doc notebooks links by @sgugger in #16969
Fix check_all_models_are_tested by @ydshieh in #16970
Add -e flag to some GH workflow yml files by @ydshieh in #16959
Update tokenization_bertweet.py by @datquocnguyen in #16941
Update check_models_are_tested to deal with Windows path by @ydshieh in #16973
Add parameter --config_overrides for run_mlm_wwm.py by @conan1024hao in #16961
Rename a class to reflect framework pattern AutoModelXxx -> TFAutoModelXxx by @amyeroberts in #16993
set eos_token_id to None to generate until max length by @ydshieh in #16989
Fix savedir for by epoch by @muellerzr in #16996
Update README to latest release by @sgugger in #16997
use scale=1.0 in floats_tensor called in speech model testers by @ydshieh in #17007
Update all require decorators to use skipUnless when possible by @muellerzr in #16999
TF: XLA bad words logits processor and list of processors by @gante in #16974
Make create_extended_attention_mask_for_decoder static method by @pbelevich in #16893
Update README_zh-hans.md by @tarzanwill in #16977
Updating variable names. by @Narsil in #16445
Revert "Updating variable names. by @Narsil in #16445)"
Replace dict/BatchEncoding instance checks by Mapping by @sgugger in #17014
Result of new doc style with fixes by @sgugger in #17015
Add a check on config classes docstring checkpoints by @ydshieh in #17012
Add translating guide by @omarespejel in #17004
update docs of length_penalty by @manandey in #17022
[FlaxGenerate] Fix bug in decoder_start_token_id by @sanchit-gandhi in #17035
Fx with meta by @michaelbenayoun in #16836
[Flax(Speech)EncoderDecoder] Fix bug in decoder_module by @sanchit-gandhi in #17036
Fix typo in RetriBERT docstring by @mpoemsl in #17018
add torch.no_grad when in eval mode by @JunnYu in #17020
Disable Flax GPU tests on push by @sgugger in #17042
Clean up vision tests by @NielsRogge in #17024
[Trainer] Move logic for checkpoint loading into separate methods for easy overriding by @calpt in #17043
Update no_trainer examples to use new logger by @muellerzr in #17044
Fix no_trainer examples to properly calculate the number of samples by @muellerzr in #17046
Allow all imports from transformers by @LysandreJik in #17050
Make the sacremoses dependency optional by @LysandreJik in #17049
Clean up setup.py by @sgugger in #17045
[T5 Tokenizer] Model has no fixed position ids - there is no hardcode… by @patrickvonplaten in #16990
[FlaxBert] Add ForCausalLM by @sanchit-gandhi in #16995
Move test model folders by @ydshieh in #17034
Make Trainer compatible with sharded checkpoints by @sgugger in #17053
Remove Python and use v2 action by @sgugger in #17059
Fix RNG reload in resume training from epoch checkpoint by @sgugger in #17055
Remove device parameter from create_extended_attention_mask_for_decoder by @pbelevich in #16894
Fix hashing for deduplication by @thomasw21 in #17048
Skip RoFormer ONNX test if rjieba not installed by @lewtun in #16981
Remove masked image modeling from BEIT ONNX export by @lewtun in #16980
Make sure telemetry arguments are not returned as unused kwargs by @sgugger in #17063
Type hint complete Albert model file. by @karthikrangasai in #16682
Deprecate model templates by @sgugger in #17062
Update to build via git for accelerate by @muellerzr in #17084
Allow saved_model export of TFCLIPModel in save_pretrained by @seanmor5 in #16886
Fix DeBERTa token_type_ids by @deutschmn in #17082
📝 open fresh PR for pipeline doctests by @stevhliu in #17073
minor change on TF Data2Vec test by @ydshieh in #17085
type hints for pytorch models by @robotjellyzone in #17064
Add type hints for BERTGeneration by @robsmith155 in #17047
Fix MLflowCallback and add support for MLFLOW_EXPERIMENT_NAME by @orieg in #17091
Remove torchhub test by @sgugger in #17097
fix missing "models" in pipeline test module by @ydshieh in #17090
Fix link to example scripts by @stevhliu in #17103
Fix self-push CI report path in cat by @ydshieh in #17111
Added BigBirdPegasus onnx config by @nandwalritik in #17104
split single_gpu and multi_gpu by @ydshieh in #17083
LayoutLMv2Processor: ensure 1-to-1 mapping between images and samples in case of overflowing tokens by @ghlai9665 in #17092
Add type hints for BigBirdPegasus and Data2VecText PyTorch models by @robsmith155 in #17123
add mobilebert onnx configs by @manandey in #17029
[WIP] Fix Pyright static type checking by replacing if-else imports with try-except by @d-miketa in #16578
Add the auto_find_batch_size capability from Accelerate into Trainer by @muellerzr in #17068
Fix MLflowCallback end_run() and add support for tags and nested runs by @orieg in #17130
Fix all docs for accelerate install directions by @muellerzr in #17145
LogSumExp trick question_answering pipeline. by @Narsil in #17143
train args defaulting None marked as Optional by @d-miketa in #17156
[trainer] sharded _load_best_model by @stas00 in #17150
[Deepspeed] add many more models to the model zoo test by @stas00 in #12695
Fixing the output of code examples in the preprocessing chapter by @HallerPatrick in #17162
missing file by @stas00 in #17164
Add MLFLOW_FLATTEN_PARAMS support in MLflowCallback by @orieg in #17148
Fix template init by @sgugger in #17163
MobileBERT tokenizer tests by @leondz in #16896
[M2M100 doc] remove duplicate example by @patil-suraj in #17175
Extend Transformers Trainer Class to Enable PyTorch SGD/Adagrad Optimizers for Training by @jianan-gu in #17154
propagate "attention_mask" dtype for "use_past" in OnnxConfig.generate_dummy_inputs by @arampacha in #17105
Convert image to rgb for clip model by @hengkuanwee in #17101
Add missing RetriBERT tokenizer tests by @mpoemsl in #17017
[WIP] Enable reproducibility for distributed trainings by @hasansalimkanmaz in #16907
Remove unnecessary columns for all dataset types in Trainer by @Yard1 in #17166
Fix LED documentation by @manuelciosici in #17181
Ensure tensors are at least 1d for pad and concat by @Yard1 in #17179
add shift_tokens_right in FlaxMT5 by @patil-suraj in #17188
Remove columns before passing to data collator by @Yard1 in #17187
Remove duplicated os.path.join by @shijie-wu in #17192
Fix contents in index.mdx to match docs' sidebar by @omarespejel in #17198
ViT and Swin symbolic tracing with torch.fx by @michaelbenayoun in #17182
migrate azure blob for beit checkpoints by @donglixp in #16902
Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning) by @sayakpaul in #17194

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@anmolsjoshi
- Added Annotations for PyTorch models (#16619)
- Replace assertion with exception (#16720)
- Moved functions to pytorch_utils.py (#16625)
@vumichien
- Add Doc Test for BERT (#16523)
- add Bigbird ONNX config (#16427)
- Add doc tests for Albert and Bigbird (#16774)
@tuvuumass
- Add self training code for text classification (#16738)
@sayakpaul
- Add Data2Vec for Vision in TF (#17008)
@robotjellyzone
- type hints for pytorch models (#17064)
@d-miketa
- [WIP] Fix Pyright static type checking by replacing if-else imports with try-except (#16578)
- train args defaulting None marked as Optional (#17156)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.19.0: OPT, FLAVA, YOLOS, RegNet, TAPEX, Data2Vec vision, FSDP integration

OPT