Error in loading AutoTokenizer when correct token passed #33897

sanketsudake · 2024-10-02T17:40:11Z

System Info

transformers 4.45.1
huggingface-hub 0.25.1
Python 3.10.14

I observed this while doing fine tune for meta-llama/Llama-3.2-1B with autotrain.

Who can help?

text models: @ArthurZucker
autotrain: @abhishekkrthakur

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Without token following works fine. I have model downloaded in cache

from transfomers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

When a valid HF token

from transfomers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B", token="<valid_token>")

Similar call happens from here https://github.com/huggingface/autotrain-advanced/blob/ed8e0c11e251531c319e47d01f5a47b2809d1ce2/src/autotrain/trainers/clm/utils.py#L588

Getting below error,

Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 406, in hf_raise_for_status
    response.raise_for_status()
  File "/app/env/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/meta-llama/Llama-3.2-1B/resolve/main/tokenizer.model

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1746, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1666, in get_hf_file_metadata
    r = _request_wrapper(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 364, in _request_wrapper
    response = _request_wrapper(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 388, in _request_wrapper
    hf_raise_for_status(response)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 417, in hf_raise_for_status
    raise _format(EntryNotFoundError, message, response) from e
huggingface_hub.errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-66fd8440-36f4b222784170220e89c80b;6a061e35-d511-4731-955f-3505d445e401)

Entry Not Found for url: https://huggingface.co/meta-llama/Llama-3.2-1B/resolve/main/tokenizer.model.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/env/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 907, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/app/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2172, in from_pretrained
    resolved_vocab_files[file_id] = cached_file(
  File "/app/env/lib/python3.10/site-packages/transformers/utils/hub.py", line 416, in cached_file
    resolved_file = hf_hub_download(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1232, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1295, in _hf_hub_download_to_cache_dir
    (url_to_download, etag, commit_hash, expected_size, head_call_error) = _get_metadata_or_catch_error(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1755, in _get_metadata_or_catch_error
    no_exist_file_path.parent.mkdir(parents=True, exist_ok=True)
  File "/app/env/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/app/.cache/huggingface/hub/models--meta-llama--Llama-3.2-1B/.no_exist/221e3535e1ac4840bdf061a12b634139c84e144c'

I tried token=True and token="invalid_token" both work fine for loading tokenizer. But only in case of passing valid token I get above error.

Expected behavior

Code should work fine and give preference to cache

from transfomers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B", token="<valid_token>")

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-10-03T15:04:26Z

cc @Wauplin 🤗

SumitdevelopAI · 2024-10-05T07:29:02Z

how to start for this
i am new here to contribute

ArthurZucker · 2024-10-05T14:35:23Z

A small guide: https://huggingface.co/docs/transformers/contributing#this-guide-was-heavily-inspired-by-the-awesome-scikitlearn-guide-to-contributinghttpsgithubcomscikitlearnscikitlearnblobmaincontributingmd 🤗

Wauplin · 2024-10-09T09:55:53Z

Hi @sanketsudake, I can't tell why you get a PermissionError when running this script on your machine as I'm pretty sure it depends on your setup. Running the snippet above with a valid or invalid token just works for me. In any case, the PermissionError happens in a part of the code that should only be an optional optimization. I opened huggingface/huggingface_hub#2594 to avoid failing in such a case.

sanketsudake · 2024-10-09T10:33:24Z

Thanks, @Wauplin for checking out. It seems I had a few permission issues on the NFS side, one of the files/commits had a different user which seems to be causing the issue.

Thanks for the fix huggingface/huggingface_hub@2a9efcc

sanketsudake added the bug label Oct 2, 2024

Wauplin mentioned this issue Oct 9, 2024

Fix PermissionError while creating '.no_exist/' directory in cache huggingface/huggingface_hub#2594

Merged

Wauplin closed this as completed in huggingface/huggingface_hub#2594 Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in loading AutoTokenizer when correct token passed #33897

Error in loading AutoTokenizer when correct token passed #33897

sanketsudake commented Oct 2, 2024 •

edited

Loading

ArthurZucker commented Oct 3, 2024

SumitdevelopAI commented Oct 5, 2024

ArthurZucker commented Oct 5, 2024

Wauplin commented Oct 9, 2024

sanketsudake commented Oct 9, 2024

Error in loading AutoTokenizer when correct token passed #33897

Error in loading AutoTokenizer when correct token passed #33897

Comments

sanketsudake commented Oct 2, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Oct 3, 2024

SumitdevelopAI commented Oct 5, 2024

ArthurZucker commented Oct 5, 2024

Wauplin commented Oct 9, 2024

sanketsudake commented Oct 9, 2024

sanketsudake commented Oct 2, 2024 •

edited

Loading