Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Refine 404 errors #878

Merged
merged 10 commits into from
Jun 13, 2022
Merged

Refine 404 errors #878

merged 10 commits into from
Jun 13, 2022

Conversation

LysandreJik
Copy link
Member

This PR adds finer error management to the Hugging Face Hub error messages.

Here's the difference between current main and this PR on the following errors:

An example for hf_api: model_info

Using the model_info method:

In [1]: from huggingface_hub import model_info

In [2]: model_info('random_model')

Before:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 model_info('random_model')

[...]

File ~/Workspaces/Python/huggingface_hub/.env/lib/python3.10/site-packages/requests/models.py:960, in Response.raise_for_status(self)
    957     http_error_msg = u'%s Server Error: %s for url: %s' % (self.status_code, reason, self.url)
    959 if http_error_msg:
--> 960     raise HTTPError(http_error_msg, response=self)

HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/random_model

After:

---------------------------------------------------------------------------
RepositoryNotFoundError                   Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 model_info('random_model')

[...]

File ~/Workspaces/Python/huggingface_hub/src/huggingface_hub/utils/_errors.py:29, in _raise_for_status(request)
     27 error_code = request.headers["X-Error-Code"]
     28 if error_code == "RepoNotFound":
---> 29     raise RepositoryNotFoundError(
     30         f"404 Client Error: Repository Not Found for url: {request.url}"
     31     )
     32 elif error_code == "EntryNotFound":
     33     raise EntryNotFoundError(
     34         f"404 Client Error: Entry Not Found for url: {request.url}"
     35     )

RepositoryNotFoundError: 404 Client Error: Repository Not Found for url: https://huggingface.co/api/models/random_model
An example for cached_download

cached_download is an interesting test-case as it can raise the three different errors. Right now, on main, using either of the three following errors results in the same error message:

In [1]: cached_download(hf_hub_url('<bad_repo_id>', 'config.json'))
In [1]: cached_download(hf_hub_url('lysandre/sharded-repo, '<bad_file>'))
In [1]: cached_download(hf_hub_url('lysandre/sharded-repo', 'config.json', revision='<bad_revision>'))
In [1]: cached_download('https://huggingface.co/random_org/random_repo', 'file.json')

[...]

File ~/Workspaces/Python/huggingface_hub/.env/lib/python3.10/site-packages/requests/models.py:960, in Response.raise_for_status(self)
    957     http_error_msg = u'%s Server Error: %s for url: %s' % (self.status_code, reason, self.url)
    959 if http_error_msg:
--> 960     raise HTTPError(http_error_msg, response=self)

HTTPError: 404 Client Error: Not Found for url: <URL>

This PR will now provide tailored error messages:

Wrong repo ID

In [1]: cached_download(hf_hub_url('<bad_repo_id>', 'config.json'))

[...]

File ~/Workspaces/Python/huggingface_hub/src/huggingface_hub/utils/_errors.py:29, in _raise_for_status(request)
     27 error_code = request.headers["X-Error-Code"]
     28 if error_code == "RepoNotFound":
---> 29     raise RepositoryNotFoundError(
     30         f"404 Client Error: Repository Not Found for url: {request.url}"
     31     )
     32 elif error_code == "EntryNotFound":
     33     raise EntryNotFoundError(
     34         f"404 Client Error: Entry Not Found for url: {request.url}"
     35     )

RepositoryNotFoundError: 404 Client Error: Repository Not Found for url: https://huggingface.co/%3Cbad_repo%3E/resolve/main/config.json

Wrong file

In [1]: cached_download(hf_hub_url('lysandre/sharded-repo, '<bad_file>'))

[...]

File ~/Workspaces/Python/huggingface_hub/src/huggingface_hub/utils/_errors.py:33, in _raise_for_status(request)
     29     raise RepositoryNotFoundError(
     30         f"404 Client Error: Repository Not Found for url: {request.url}"
     31     )
     32 elif error_code == "EntryNotFound":
---> 33     raise EntryNotFoundError(
     34         f"404 Client Error: Entry Not Found for url: {request.url}"
     35     )
     36 elif error_code == "RevisionNotFound":
     37     raise RevisionNotFoundError(
     38         f"404 Client Error: Revision Not Found for url: {request.url}"
     39     )

EntryNotFoundError: 404 Client Error: Entry Not Found for url: https://huggingface.co/lysandre/sharded-repo/resolve/main/%3Cbad%3E

Wrong revision

In [1]: cached_download(hf_hub_url('lysandre/sharded-repo', 'config.json', revision='<bad_revision>'))
In [2]: cached_download(hf_hub_url('lysandre/sharded-repo', 'config.json', revision='bad'))

[...]

File ~/Workspaces/Python/huggingface_hub/src/huggingface_hub/utils/_errors.py:37, in _raise_for_status(request)
     33         raise EntryNotFoundError(
     34             f"404 Client Error: Entry Not Found for url: {request.url}"
     35         )
     36     elif error_code == "RevisionNotFound":
---> 37         raise RevisionNotFoundError(
     38             f"404 Client Error: Revision Not Found for url: {request.url}"
     39         )
     41 request.raise_for_status()

RevisionNotFoundError: 404 Client Error: Revision Not Found for url: https://huggingface.co/lysandre/sharded-repo/resolve/bad/config.json

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 18, 2022

The documentation is not available anymore as the PR was closed or merged.

@LysandreJik
Copy link
Member Author

LysandreJik commented May 18, 2022

This PR can be independent of the cache-file-layout branch, so I'll switch the target to main and rebase cache-file-layout on this branch to benefit from the fix.

@LysandreJik LysandreJik changed the base branch from cache-file-layout to main May 18, 2022 21:41
Comment on lines 32 to 64
elif error_code == "EntryNotFound":
raise EntryNotFoundError(
f"404 Client Error: Entry Not Found for url: {request.url}"
)
elif error_code == "RevisionNotFound":
raise RevisionNotFoundError(
f"404 Client Error: Revision Not Found for url: {request.url}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but I would change the order of those two as EntryNotFound implies that a revision was found

i.e. the errors are more and more "closer to being successful"

Copy link
Contributor

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

Comment on lines 6 to 32
Raised when trying to access a hf.co URL with an invalid repository name, or
with a private repo name the user does not have access to.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line should be a short single line explanation, then a longer description if necessary, an an example of when the exception would be raised would also be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Updated in latest commit

"""


class EntryNotFoundError(HTTPError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this maybe be FileNotFoundError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will let @julien-c comment on the choice of "entry", but I'd rather stay as close as possible to the error thrown by the backend

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong opinion! up to you.

Maybe we can document that this mean that the repo AND the revision exist/are accessible, BUT not the file/entry? or is it clear enough already?

@LysandreJik LysandreJik added this to the v0.8 milestone Jun 10, 2022
@LysandreJik LysandreJik mentioned this pull request Jun 10, 2022
@LysandreJik
Copy link
Member Author

Will adapt to the new 401 error.

@LysandreJik
Copy link
Member Author

@SBrandeis, @adrinjalali, could you verify and confirm this looks good to you?

Copy link
Contributor

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM!

@@ -1180,6 +1181,18 @@ def model_info(

Returns:
[`huggingface_hub.hf_api.ModelInfo`]: The model repository information.

<Tip>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a tip or in the Raises section? (same for the following changes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far we have been using <Tip> across the codebase as Raises wasn't working with doc-builder. It's my fault, it seems I was delaying huggingface/doc-builder#141.

If it's fine with you, I'll merge it like this and open an issue to modify all of those <Tip> to Raises: once the PR above is merged.

url = hf_hub_url("bert-base", filename="pytorch_model.bin")
with self.assertRaisesRegex(
RepositoryNotFoundError, "401 Client Error: Repository Not Found"
):
_ = cached_download(url, legacy_cache_layout=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the behavior changes when authenticated - in that case the error message will be:
404 Client Error: Repository Not Found

@SBrandeis
Copy link
Contributor

LGTM 👍

Let's maybe add a comment stating that http_get / cached_download should only be used to request stuff related to a repo? (because otherwise raising RepoNotFOundError does not make a lot of sense)

Or maybe add an additional check in _raise_for_status to that end?

Feel free to disregard if you think it's not relevant :)

Comment on lines +55 to +76
if "X-Error-Code" in request.headers:
error_code = request.headers["X-Error-Code"]
if error_code == "RepoNotFound":
raise RepositoryNotFoundError(
f"404 Client Error: Repository Not Found for url: {request.url}. "
"If the repo is private, make sure you are authenticated."
)
elif error_code == "RevisionNotFound":
raise RevisionNotFoundError(
f"404 Client Error: Revision Not Found for url: {request.url}"
)
elif error_code == "EntryNotFound":
raise EntryNotFoundError(
f"404 Client Error: Entry Not Found for url: {request.url}"
)

if request.status_code == 401:
# The repo was not found and the user is not Authenticated
raise RepositoryNotFoundError(
f"401 Client Error: Repository Not Found for url: {request.url}. "
"If the repo is private, make sure you are authenticated."
)
Copy link
Member

@julien-c julien-c Jun 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic of _raise_for_status LGTM (also cc @Pierrci @allendorf @coyotte508 for a double check)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW we should maybe have "extended" i.e. wrapped the existing HTTPError instead of re-instantiating a new one.

This change broke this code in Speechbrain:
https://github.com/speechbrain/speechbrain/blob/d6bfe138a90dff3490f7196acbd9c62939289d46/speechbrain/pretrained/fetching.py#L115-L121

cc @osanseviero @mravanelli

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the 404 shouldn't be hardcoded but should ideally be replaced by {request.status_code} to reflect the actual server output

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #924

@LysandreJik
Copy link
Member Author

I added some comments @SBrandeis. Should be good to merge.

@SBrandeis SBrandeis self-requested a review June 13, 2022 07:25
@julien-c julien-c merged commit d61d6ec into huggingface:main Jun 13, 2022
TristanThrush pushed a commit that referenced this pull request Jun 19, 2022
* Add errors

* Style

* Review

* hf_hub_download support

* Errors for hf_hub_download

* Typo

* Handle 401 error

* Tests for 401 error

* Typo

* Review comments
adrinjalali pushed a commit that referenced this pull request Jun 21, 2022
* added autoeval fields to repocard types

* modified test, linted

* Fix typo (#902)

* Refine 404 errors (#878)

* Add errors

* Style

* Review

* hf_hub_download support

* Errors for hf_hub_download

* Typo

* Handle 401 error

* Tests for 401 error

* Typo

* Review comments

* added documentation for the repocard change in repocard, updated metadata_eval_result, added documentation

* 🏗 Use `hub-ci` for tests (#898)

* 🏗 Use `hub-ci` for tests

cc @XciD

* 🩹 Also update URL for staging mode

* ✅ 401 is raised when the user is not authenticated

* 🗑 Deprecare `identical_ok`

* Longer deprecation period

* ✅ Fix the last failing test

* Warning match docstring

* FIX Avoid creating repository when it exists on remote (#900)

* fix for spaces

* fix for spaces

* removed creating repository and added warning

* revert my changes

* added tests

* removed debugger 😐

* fixed repository removal

* Added tests and error

* import pytest

* fixed tests

* fixed tests

* style

* removed repo removal

* make style

* fixed test with Lysandres patch

* added fix

* Remove deprecations (#910)

* Remove deprecations

* Typo

* Update src/huggingface_hub/README.md

Co-authored-by: Julien Chaumond <julien@huggingface.co>

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Add request ID to all requests (#909)

* Add request ID to all requests

* Typo

* Typo

* Review comments

* Update src/huggingface_hub/utils/_errors.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Invert deprecation for create_repo (#912)

* Invert deprecation for create_repo

* Set to 0.10

* Constant was accidentally removed during deprecation transition (#913)

Co-authored-by: Lyle Nel <lyle@owlin.com>

* Update src/huggingface_hub/repocard_types.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update repocard_types.py

* Update repocard_types.py

* reorded metadata_eval_request_docs, added metrics_config and metrics_verified to tests

Co-authored-by: lsb <leebutterman@gmail.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Simon Brandeis <33657802+SBrandeis@users.noreply.github.com>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lyle Nel <lyle-nel@users.noreply.github.com>
Co-authored-by: Lyle Nel <lyle@owlin.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants