Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Use a finer exception when local_files_only=True and a file is missing in cache #979

Closed
sgugger opened this issue Aug 9, 2022 · 8 comments · Fixed by #985
Closed

Use a finer exception when local_files_only=True and a file is missing in cache #979

sgugger opened this issue Aug 9, 2022 · 8 comments · Fixed by #985
Assignees

Comments

@sgugger
Copy link
Contributor

sgugger commented Aug 9, 2022

In Transformers we sometime try to access files that are not in a repo, catch the error (EntryNotFoundError for distant repos) and try a different file. For instance most models have a single weight file named pytorch_model.bin but some models have several checkpoint files and an index since they are extremely big. For those, there is no pytorch_model.bin but a pytorch_model.bin.index.json.

Therefore, the logic in from_pretrained is to look at pytorch_model.bin first, and if it's not there, at pytorch_model.bin.index.json. Now when we have an internet connection, we can catch the EntryNotFoundError and all is fine. When there is no internet or the user decided to activate the offline mode however, hf_hub_download returns a ValueError, but it also returns a ValueError in many different situations, so we need to catch it and match the error message in Transformers which is not very clean, and very prone to breaking in the future.

It would be much nicer if the error raised was more specific, like a FileNotFoundError or any subclass of ValueError that would only be raised in this specific situation.

@julien-c
Copy link
Member

julien-c commented Aug 9, 2022

i saw this being discussed with a community member in transformers, mind linking the convo here?

@sgugger
Copy link
Contributor Author

sgugger commented Aug 9, 2022

I haven't seen it discussed anywhere personally, this stems from this PR (bug reported by Stas on slack) and I was pointing at it on this comment.

@julien-c
Copy link
Member

julien-c commented Aug 9, 2022

yes that's what i meant, thanks!

@LysandreJik
Copy link
Member

cc @Wauplin if you have an opinion on the error handling currently done in huggingface_hub :)

@Wauplin
Copy link
Contributor

Wauplin commented Aug 10, 2022

Request seems legitimate :)

By quickly looking at the code, error handling already looks good with the refined HTTPError (RepositoryNotFoundError, RevisionNotFoundError and EntryNotFoundError). I don't see any problem doing the same to refine the ValueError. We just have to agree on the naming/inheritance of them.

I see 2 cases where we should keep the base ValueError as it's really an error from the user:

raise ValueError("We have no connection or you passed local_files_only, so  force_download is not an accepted option.")
raise ValueError(f"Invalid repo type: {repo_type}. Accepted repo types are: {str(REPO_TYPES)}")

For the 2 others cases, it's definitely the case you described where the entry is not found locally.

raise ValueError("Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.")
raise ValueError("Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.")

From a clean state, my first instinct would be to create 3 exceptions:

class EntryNotFoundError(Exception):
    (...)
    
class LocalEntryNotFoundError(EntryNotFoundError):
    (...)

class DistantEntryNotFoundError(EntryNotFoundError, HTTPError):
    (...)

But since EntryNotFoundError is already defined and derives from HTTPError, I'd say an ok-ish solution would be to create a LocalEntryNotFoundError(ValueError, EntryNotFoundError) deriving from ValueError (for backward compatibility) and EntryNotFoundError. Main issue would be to have a LocalEntryNotFoundError deriving from HTTPError even though it's not entirely network-related but I guess we can live with it.

@sgugger how would you like it ?

@sgugger
Copy link
Contributor Author

sgugger commented Aug 10, 2022

LocalEntryNotFoundError(ValueError, EntryNotFoundError) is ideal for my use case since I won't need to add an extra catch (it's going to be caught in the earlier EntryNotFoundError) but I can also live with LocalEntryNotFoundError(ValueError) if people feel strongly about this exception deriving from an HTTPError despite being local.

@Wauplin
Copy link
Contributor

Wauplin commented Aug 10, 2022

@julien-c @LysandreJik If you are strongly opinionated against the LocalEntryNotFoundError(ValueError, EntryNotFoundError), please let me know. Otherwise let's go for it.

@julien-c
Copy link
Member

no opinion on my side!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
4 participants