Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Error in loading AutoTokenizer when correct token passed #33897

Closed
2 of 4 tasks
sanketsudake opened this issue Oct 2, 2024 · 5 comments · Fixed by huggingface/huggingface_hub#2594
Closed
2 of 4 tasks
Labels

Comments

@sanketsudake
Copy link

sanketsudake commented Oct 2, 2024

System Info

transformers 4.45.1
huggingface-hub 0.25.1
Python 3.10.14

I observed this while doing fine tune for meta-llama/Llama-3.2-1B with autotrain.

Who can help?

text models: @ArthurZucker
autotrain: @abhishekkrthakur

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Without token following works fine. I have model downloaded in cache

from transfomers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

When a valid HF token

from transfomers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B", token="<valid_token>")

Similar call happens from here https://github.com/huggingface/autotrain-advanced/blob/ed8e0c11e251531c319e47d01f5a47b2809d1ce2/src/autotrain/trainers/clm/utils.py#L588

Getting below error,

Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 406, in hf_raise_for_status
    response.raise_for_status()
  File "/app/env/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/meta-llama/Llama-3.2-1B/resolve/main/tokenizer.model

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1746, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1666, in get_hf_file_metadata
    r = _request_wrapper(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 364, in _request_wrapper
    response = _request_wrapper(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 388, in _request_wrapper
    hf_raise_for_status(response)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 417, in hf_raise_for_status
    raise _format(EntryNotFoundError, message, response) from e
huggingface_hub.errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-66fd8440-36f4b222784170220e89c80b;6a061e35-d511-4731-955f-3505d445e401)

Entry Not Found for url: https://huggingface.co/meta-llama/Llama-3.2-1B/resolve/main/tokenizer.model.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/env/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 907, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/app/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2172, in from_pretrained
    resolved_vocab_files[file_id] = cached_file(
  File "/app/env/lib/python3.10/site-packages/transformers/utils/hub.py", line 416, in cached_file
    resolved_file = hf_hub_download(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1232, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1295, in _hf_hub_download_to_cache_dir
    (url_to_download, etag, commit_hash, expected_size, head_call_error) = _get_metadata_or_catch_error(
  File "/app/env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1755, in _get_metadata_or_catch_error
    no_exist_file_path.parent.mkdir(parents=True, exist_ok=True)
  File "/app/env/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: '/app/.cache/huggingface/hub/models--meta-llama--Llama-3.2-1B/.no_exist/221e3535e1ac4840bdf061a12b634139c84e144c'

I tried token=True and token="invalid_token" both work fine for loading tokenizer. But only in case of passing valid token I get above error.

Expected behavior

Code should work fine and give preference to cache

from transfomers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B", token="<valid_token>")
@ArthurZucker
Copy link
Collaborator

cc @Wauplin 🤗

@SumitdevelopAI
Copy link

how to start for this
i am new here to contribute

@Wauplin
Copy link
Contributor

Wauplin commented Oct 9, 2024

Hi @sanketsudake, I can't tell why you get a PermissionError when running this script on your machine as I'm pretty sure it depends on your setup. Running the snippet above with a valid or invalid token just works for me. In any case, the PermissionError happens in a part of the code that should only be an optional optimization. I opened huggingface/huggingface_hub#2594 to avoid failing in such a case.

@sanketsudake
Copy link
Author

Thanks, @Wauplin for checking out. It seems I had a few permission issues on the NFS side, one of the files/commits had a different user which seems to be causing the issue.

Thanks for the fix huggingface/huggingface_hub@2a9efcc

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants