AttributeError: 'BertTokenizerFast' object has no attribute 'max_len' #8739

zcain117 · 2020-11-23T20:09:00Z

Environment info

transformers version: 4.0.0-rc-1
Platform: Linux-4.9.0-14-amd64-x86_64-with-debian-9.13
Python version: 3.6.10
PyTorch version (GPU?): 1.8.0a0+4ed7f36 (False)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: No
Using distributed or parallel set-up in script?: 8-core TPU training
Using TPU

Who can help

albert, bert, GPT2, XLM: @LysandreJik

Information

Model I am using (Bert, XLNet ...): bert and roberta

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: mlm
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

2 examples of failing commands:

E 2020-11-18T17:38:08.657584093Z python examples/xla_spawn.py \
E 2020-11-18T17:38:08.657588780Z   --num_cores 8 \
E 2020-11-18T17:38:08.657593609Z   examples/contrib/legacy/run_language_modeling.py \
E 2020-11-18T17:38:08.657598646Z   --logging_dir ./tensorboard-metrics \
E 2020-11-18T17:38:08.657604088Z   --cache_dir ./cache_dir \
E 2020-11-18T17:38:08.657609492Z   --train_data_file /datasets/wikitext-103-raw/wiki.train.raw \
E 2020-11-18T17:38:08.657614614Z   --do_train \
E 2020-11-18T17:38:08.657619772Z   --do_eval \
E 2020-11-18T17:38:08.657624531Z   --eval_data_file /datasets/wikitext-103-raw/wiki.valid.raw \
E 2020-11-18T17:38:08.657629731Z   --overwrite_output_dir \
E 2020-11-18T17:38:08.657641827Z   --output_dir language-modeling \
E 2020-11-18T17:38:08.657647203Z   --logging_steps 100 \
E 2020-11-18T17:38:08.657651823Z   --save_steps 3000 \
E 2020-11-18T17:38:08.657656739Z   --overwrite_cache \
E 2020-11-18T17:38:08.657661282Z   --tpu_metrics_debug \
E 2020-11-18T17:38:08.657667598Z  --mlm --model_type=bert \
E 2020-11-18T17:38:08.657672545Z --model_name_or_path bert-base-cased \
E 2020-11-18T17:38:08.657677441Z --num_train_epochs 3 \
E 2020-11-18T17:38:08.657682320Z --per_device_train_batch_size 16 \
E 2020-11-18T17:38:08.657687053Z --per_device_eval_batch_size 16

2020-11-18T17:51:49.357234955Z Traceback (most recent call last):
 E 
2020-11-18T17:51:49.357239554Z   File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn
 E 
2020-11-18T17:51:49.357245350Z     _start_fn(index, pf_cfg, fn, args)
 E 
2020-11-18T17:51:49.357249851Z   File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
 E 
2020-11-18T17:51:49.357254654Z     fn(gindex, *args)
 E 
2020-11-18T17:51:49.357272443Z   File "/transformers/examples/contrib/legacy/run_language_modeling.py", line 359, in _mp_fn
 E 
2020-11-18T17:51:49.357277658Z     main()
 E 
2020-11-18T17:51:49.357281928Z   File "/transformers/examples/contrib/legacy/run_language_modeling.py", line 279, in main
 E 
2020-11-18T17:51:49.357287863Z     data_args.block_size = tokenizer.max_len
 E 
2020-11-18T17:51:49.357292355Z AttributeError: 'BertTokenizerFast' object has no attribute 'max_len'
 E

E 2020-11-18T06:47:53.910306819Z python examples/xla_spawn.py \
E 2020-11-18T06:47:53.910310176Z   --num_cores 8 \
E 2020-11-18T06:47:53.910314263Z   examples/contrib/legacy/run_language_modeling.py \
E 2020-11-18T06:47:53.910319173Z   --logging_dir ./tensorboard-metrics \
E 2020-11-18T06:47:53.910322683Z   --cache_dir ./cache_dir \
E 2020-11-18T06:47:53.910325895Z   --train_data_file /datasets/wikitext-103-raw/wiki.train.raw \
E 2020-11-18T06:47:53.910329170Z   --do_train \
E 2020-11-18T06:47:53.910332491Z   --do_eval \
E 2020-11-18T06:47:53.910335626Z   --eval_data_file /datasets/wikitext-103-raw/wiki.valid.raw \
E 2020-11-18T06:47:53.910340314Z   --overwrite_output_dir \
E 2020-11-18T06:47:53.910343710Z   --output_dir language-modeling \
E 2020-11-18T06:47:53.910347004Z   --logging_steps 100 \
E 2020-11-18T06:47:53.910350089Z   --save_steps 3000 \
E 2020-11-18T06:47:53.910353259Z   --overwrite_cache \
E 2020-11-18T06:47:53.910356297Z   --tpu_metrics_debug \
E 2020-11-18T06:47:53.910359351Z  --mlm --model_type=roberta \
E 2020-11-18T06:47:53.910362484Z --tokenizer=roberta-base \
E 2020-11-18T06:47:53.910365650Z --num_train_epochs 5 \
E 2020-11-18T06:47:53.910368797Z --per_device_train_batch_size 8 \
E 2020-11-18T06:47:53.910371843Z --per_device_eval_batch_size 8

2020-11-18T06:48:27.357394365Z Traceback (most recent call last):
 E 
2020-11-18T06:48:27.357399685Z   File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn
 E 
2020-11-18T06:48:27.357405353Z     _start_fn(index, pf_cfg, fn, args)
 E 
2020-11-18T06:48:27.357426600Z   File "/root/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn
 E 
2020-11-18T06:48:27.357448514Z     fn(gindex, *args)
 E 
2020-11-18T06:48:27.357454250Z   File "/transformers/examples/contrib/legacy/run_language_modeling.py", line 359, in _mp_fn
 E 
2020-11-18T06:48:27.357460262Z     main()
 E 
2020-11-18T06:48:27.357465843Z   File "/transformers/examples/contrib/legacy/run_language_modeling.py", line 279, in main
 E 
2020-11-18T06:48:27.357471227Z     data_args.block_size = tokenizer.max_len
 E 
2020-11-18T06:48:27.357477576Z AttributeError: 'RobertaTokenizerFast' object has no attribute 'max_len'
 E

The timing of this issue lines up with #8586
Tests started failing on the evening of Nov 17, a few hours after that PR was submitted

The text was updated successfully, but these errors were encountered:

LysandreJik · 2020-11-23T20:36:20Z

It is actually due to #8604, where we removed several deprecated arguments. The run_language_modeling.py script is deprecated in favor of language-modeling/run_{clm, plm, mlm}.py.

Is it possible for you to switch to one of these newer scripts? If not, the fix is to change max_len to model_max_length. We welcome PRs to fix it, but we won't be maintaining that script ourselves as there exists better alternatives now (which run on TPU too 🙂)

zcain117 · 2020-11-23T20:53:51Z

Thanks for taking a look! I will try out the new script

zcain117 · 2020-11-24T23:47:20Z

The new runner is working for us on TPUs. Thanks again for the tip!

Shafi2016 · 2020-11-25T03:12:25Z

Hello, Everything was a few days. I am getting the same error " data_args.block_size = min(data_args.block_size, tokenizer.max_len)
AttributeError: 'RobertaTokenizerFast' object has no attribute 'max_len".

I can't switch to a new script as you mentioned. Kindly help me with this error. I do not know how to fix it. Here is my chunk of codes.


`!python "/content/transformers/examples/contrib/legacy/run_language_modeling.py" \
    --output_dir "/content/drive/MyDrive/Vancouver" \
    --model_name_or_path roberta-base \
    --do_train \
    --per_gpu_train_batch_size 8 \
    --seed 42 \
    --train_data_file "/content/input_textOC.txt" \
    --block_size 256 \
    --line_by_line \
    --learning_rate 6e-4 \
    --num_train_epochs 3 \
    --save_total_limit 2 \
    --save_steps 200 \
    --weight_decay 0.01 \
     --mlm`

LysandreJik · 2020-11-25T19:13:39Z

It is actually due to #8604, where we removed several deprecated arguments. The run_language_modeling.py script is deprecated in favor of language-modeling/run_{clm, plm, mlm}.py.

Is it possible for you to switch to one of these newer scripts? If not, the fix is to change max_len to model_max_length. We welcome PRs to fix it, but we won't be maintaining that script ourselves as there exists better alternatives now (which run on TPU too )

The fix is mentioned above:

fix is to change max_len to model_max_length

LysandreJik · 2020-11-25T19:14:28Z

If you cannot switch scripts, I recommend pinning the library. You're having this error because you're using a legacy script with a master version that is not compatible.

You could pin it to v3.5.1.

Shafi2016 · 2020-11-25T19:20:30Z

Thanks, I appreciate your response. However, I am still a basic learner. Can you please explain it a bit? how to pin it to v3.5.1.. Is it mean to use the old version of huggingface.?

LysandreJik · 2020-11-25T20:18:28Z

If you wish to stick to that deprecated example, yes! You can do so by checking out the tag v3.5.1:

git checkout v3.5.1

If you have installed transformers from pypi (and not from source), you should also update your transformers version:

pip install -U transformers==3.5.1

Please note that the script won't be in "/content/transformers/examples/contrib/legacy/run_language_modeling.py" anymore, but in "/content/transformers/examples/language-modeling/run_language_modeling.py"

This is to fix deprecated reference to `tokenizer.max_len` with `tokenizer.model_max_length` - similar to [issue 8739](#8739) and [PR 8604](#8604). Example [here](https://colab.research.google.com/gist/poedator/f8776349e5c625ce287fc6fcd312fa1e/tokenizer-max_len-error-in-transformers_glue.ipynb). The error happens when `glue_convert_examples_to_features` is called without `max_length` parameter specified. In that case line 119 with wrong reference gets called. This simple fix should do it.

jayachaturvedi · 2021-07-05T09:27:17Z

It is actually due to #8604, where we removed several deprecated arguments. The run_language_modeling.py script is deprecated in favor of language-modeling/run_{clm, plm, mlm}.py.

Hello, I am facing the same issue with run_language_modeling.py (and more). Where can I find this new file language-modeling/run_{clm, plm, mlm}.py? Thanks!

LysandreJik · 2021-07-05T10:46:18Z

You can find them here https://github.com/huggingface/transformers/tree/master/examples/pytorch/language-modeling

jayachaturvedi · 2021-07-06T10:10:55Z

Thank you!

mzhadigerov · 2022-09-15T01:53:53Z

It is actually due to #8604, where we removed several deprecated arguments. The run_language_modeling.py script is deprecated in favor of language-modeling/run_{clm, plm, mlm}.py.

Is it possible for you to switch to one of these newer scripts? If not, the fix is to change max_len to model_max_length. We welcome PRs to fix it, but we won't be maintaining that script ourselves as there exists better alternatives now (which run on TPU too 🙂)

Change max_len to model_max_length where?

riaraju · 2024-07-22T02:06:42Z

fix is to change max_len to model_max_length

in which all model files should this be fixed ?

Fix deprecated argument that was removed (see huggingface/transformers#8739).

zcain117 closed this as completed Nov 24, 2020

kirzharov mentioned this issue Dec 4, 2020

score function failed with the new transformers v4.0.0 Tiiiger/bert_score#85

Closed

poedator mentioned this issue Feb 16, 2021

fix deprecated reference tokenizer.max_len in glue.py #10220

Merged

poedator mentioned this issue Feb 24, 2021

deprecated reference tokenizer.max_len in glue.py (PR #10220) #10372

Closed

niklashoelter mentioned this issue May 17, 2021

SmilesTokenizer Instance can not be created deepchem/deepchem#2519

Closed

jayachaturvedi mentioned this issue Jul 5, 2021

run_language_modeling.py gmichalo/UmlsBERT#4

Closed

daehuikim mentioned this issue Apr 26, 2024

AttributeError: 'RobertaTokenizerFast' object has no attribute 'max_len' Tiiiger/bert_score#183

Open

JasperD-UGent added a commit to JasperD-UGent/diaparser that referenced this issue Dec 10, 2024

Update biaffine_dependency.py

7ed5ad5

Fix deprecated argument that was removed (see huggingface/transformers#8739).

JasperD-UGent mentioned this issue Dec 10, 2024

Fix deprecated argument transformers Unipisa/diaparser#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'BertTokenizerFast' object has no attribute 'max_len' #8739

AttributeError: 'BertTokenizerFast' object has no attribute 'max_len' #8739

zcain117 commented Nov 23, 2020

LysandreJik commented Nov 23, 2020

zcain117 commented Nov 23, 2020

zcain117 commented Nov 24, 2020

Shafi2016 commented Nov 25, 2020 •

edited

Loading

LysandreJik commented Nov 25, 2020

LysandreJik commented Nov 25, 2020

Shafi2016 commented Nov 25, 2020

LysandreJik commented Nov 25, 2020

jayachaturvedi commented Jul 5, 2021

LysandreJik commented Jul 5, 2021

jayachaturvedi commented Jul 6, 2021

mzhadigerov commented Sep 15, 2022

riaraju commented Jul 22, 2024

AttributeError: 'BertTokenizerFast' object has no attribute 'max_len' #8739

AttributeError: 'BertTokenizerFast' object has no attribute 'max_len' #8739

Comments

zcain117 commented Nov 23, 2020

Environment info

Who can help

Information

To reproduce

LysandreJik commented Nov 23, 2020

zcain117 commented Nov 23, 2020

zcain117 commented Nov 24, 2020

Shafi2016 commented Nov 25, 2020 • edited Loading

LysandreJik commented Nov 25, 2020

LysandreJik commented Nov 25, 2020

Shafi2016 commented Nov 25, 2020

LysandreJik commented Nov 25, 2020

jayachaturvedi commented Jul 5, 2021

LysandreJik commented Jul 5, 2021

jayachaturvedi commented Jul 6, 2021

mzhadigerov commented Sep 15, 2022

riaraju commented Jul 22, 2024

Shafi2016 commented Nov 25, 2020 •

edited

Loading