tokenizer started throwing this warning, ""Truncation was not explicitely activated but `max_length` is provided a specific value, please use `truncation=True` to explicitely truncate examples to max length. Defaulting to 'only_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you may want to check this is the right behavior."" #5397

saahiluppal · 2020-06-30T12:07:33Z

Recently while experimenting, BertTokenizer start to throw this warning

Truncation was not explicitely activated but `max_length` is provided a specific value, please use `truncation=True` to explicitely truncate examples to max length. Defaulting to 'only_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you may want to check this is the right behavior.

I know, this warning asks to provide truncation value.
I'm asking here because this warning started this morning.

LysandreJik · 2020-06-30T23:25:01Z

This is because we recently upgraded the library to version v3.0.0, which has an improved tokenizers API. You can either disable warnings or put truncation=True to remove that warning (as indicated in the warning).

rainean · 2020-07-02T15:05:24Z

how do you disable the warnings for this? I'm encountering the same issue. But I don't want to set the truncation=True

LysandreJik · 2020-07-02T18:12:32Z

You can disable the warnings with:

import logging
logging.basicConfig(level=logging.ERROR)

tutmoses · 2020-07-18T20:59:00Z

I've changed the logging level and removed max_length but am still getting this error:

WARNING:transformers.tokenization_utils_base:Truncation was not explicitely activated but max_length is provided a specific value, please use truncation=True to explicitely truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to truncation.

LysandreJik · 2020-07-28T13:12:24Z

On which version are you running? Can you try to install v3.0.2 to see if it fixes this issue?

wise-east · 2020-07-29T07:58:05Z

I've tried with v3.0.2 and I'm getting the same warning messages even when I changed the logging level with the code snippet above.

thomwolf · 2020-07-29T08:03:32Z

@tutmoses @wise-east can you give us a self-contained code example reproducing the behavior?

iamxinxin · 2020-08-11T06:19:45Z

I got the same question

saahiluppal · 2020-08-11T06:24:31Z

update transformers library to v3 and explicitly provide "trucation=True" while encoding text using tokenizers

RBeaudet · 2020-08-13T16:36:55Z

Could reproduce the error with this code:

from transformers.data.processors.utils import SingleSentenceClassificationProcessor
tokenizer = CamembertTokenizer.from_pretrained("camembert-base")

texts = ["hi", "hello", "salut", "bonjour"]
labels = [0, 0, 1, 1,]

processor = SingleSentenceClassificationProcessor().create_from_examples(texts, labels)
dataset = processor.get_features(tokenizer=tokenizer)

Fixes the issue similar to the mentioned in huggingface/transformers#5397, starting from transformers version v3.0.0.

jusugac1 · 2020-09-01T15:14:30Z

Hello,

Using the following command had solved the problem:

import logging logging.basicConfig(level = logging.ERROR)

However, since today 15h40 (Paris time), it does not work anymore and the following warning continues to pop up until crashing Google Colab:

Truncation was not explicitely activated but max_lengthis provided a specific value, please usetruncation=Trueto explicitely truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy totruncation.

Could you please tell me how to solve it? I also tried to deactivate truncation from the encode_plus tokenizer:

encoded_dict = tokenizer.encode_plus( sent, # Sentence to encode. add_special_tokens = True, # Add '[CLS]' and '[SEP]' max_length = 128, # Pad & truncate all sentences. pad_to_max_length = True, return_attention_mask = True, # Construct attn. masks. return_tensors = 'pt', # Return pytorch tensors. truncation = False )

But it did not work.

Thank for your help/replies,

----------EDIT---------------

I modified my code in the following way by setting "truncation = True" as suggested on this post. It worked perfectly! From what I understood, this should consider the max_lenght I'm applying and avoid the warning from comming up.

encoded_dict = tokenizer.encode_plus( sent, # Sentence to encode. add_special_tokens = True, # Add '[CLS]' and '[SEP]' max_length = 128, # Pad & truncate all sentences. pad_to_max_length = True, return_attention_mask = True, # Construct attn. masks. return_tensors = 'pt', # Return pytorch tensors. truncation = True )

J.

Kerry-zzx · 2020-09-10T14:56:04Z

'truncation=True' solves the problem.
tokenizer = BertTokenizer.from_pretrained(cfg.text_model.pretrain)
lengths = [len(tokenizer.tokenize(c)) + 2 for c in captions]
captions_ids = [torch.LongTensor(tokenizer.encode(c, max_length=max_len, pad_to_max_length=True**, truncation=True**))
for c in captions]

swuxyj · 2020-11-09T09:15:05Z

not elegant solution
modify transformers source code (~/python/site-packages/transformers/tokenization_utils_base.py) line 1751 to aviod this warning

            if 0:       #if verbose:
                logger.warning(
                    "Truncation was not explicitely activated but `max_length` is provided a specific value, "
                    "please use `truncation=True` to explicitely truncate examples to max length. "
                    "Defaulting to 'longest_first' truncation strategy. "
                    "If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy "
                    "more precisely by providing a specific strategy to `truncation`."
                )
            truncation = "longest_first"

IamHimon · 2021-05-11T08:17:50Z

add 'truncation=True' to tokenizer.encode_plus(truncation=True).
work to me!

liuyang1123 · 2024-12-25T16:31:02Z

This line is effective
tokenizer.deprecation_warnings["Truncation-not-explicitly-activated"] = True

LysandreJik closed this as completed Jun 30, 2020

LysandreJik added the Core: Tokenization Internals of the library; Tokenization. label Jun 30, 2020

jinyongyoo mentioned this issue Jul 1, 2020

Add truncation argument to remove truncation warnings Tiiiger/bert_score#68

Merged

davidalami added a commit to davidalami/transformers-tutorials that referenced this issue Aug 31, 2020

Passed truncation to encode_plus method

e3c7536

Fixes the issue similar to the mentioned in huggingface/transformers#5397, starting from transformers version v3.0.0.

davidalami mentioned this issue Aug 31, 2020

Passed truncation to encode_plus method abhimishra91/transformers-tutorials#13

Merged

En-J-A mentioned this issue Sep 9, 2020

Error at .get_features function related to truncation aub-mind/arabert#26

Closed

potsawee mentioned this issue Jul 1, 2021

training bug potsawee/longsum0#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

saahiluppal commented Jun 30, 2020

LysandreJik commented Jun 30, 2020

rainean commented Jul 2, 2020

LysandreJik commented Jul 2, 2020 •

edited

Loading

tutmoses commented Jul 18, 2020

LysandreJik commented Jul 28, 2020

wise-east commented Jul 29, 2020

thomwolf commented Jul 29, 2020

iamxinxin commented Aug 11, 2020

saahiluppal commented Aug 11, 2020

RBeaudet commented Aug 13, 2020 •

edited

Loading

jusugac1 commented Sep 1, 2020 •

edited

Loading

Kerry-zzx commented Sep 10, 2020

swuxyj commented Nov 9, 2020

IamHimon commented May 11, 2021

liuyang1123 commented Dec 25, 2024

Comments