Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Properly raise FileNotFound even if the dataset is private #4536

Merged
merged 6 commits into from
Jun 28, 2022

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Jun 21, 2022

tests/test_load.py::test_load_streaming_private_dataset was failing because the hub now returns 401 when getting the HfApi.dataset_info of a dataset without authentication. load_dataset was raising ConnectionError, while it should be FileNoteFoundError since it first checks for local files before checking the Hub.

Moreover when use_auth_token is not set (default is False), we should not pass token=None to HfApi.dataset_info, or it will use the local token by default - instead it should use no token. It's currently not possible to ask for no token to be used, so as a workaround I simply set token="no-token"

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 21, 2022

The documentation is not available anymore as the PR was closed or merged.

@@ -783,7 +783,7 @@ def get_module(self) -> DatasetModule:
hfh_dataset_info = HfApi(config.HF_ENDPOINT).dataset_info(
self.name,
revision=self.revision,
token=token,
token=token if token else "no-token",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would passing token=False work instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also don't hesitate to ping @SBrandeis / @coyotte508 / @Pierrci on those kind of PRs =)

Copy link
Member Author

@lhoestq lhoestq Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the source code of dataset_info and it would use token="False" and pass Bearer False for the authentication x) so yes it would work

Though the type hint requires token to be a string, not a boolean. So unless we're ok to say that the type hint can be ignored, I'll keep "no-token"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"no-token" won't trigger the type checker so I think it's better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If HFH maintainers prefer "no-token", then this is OK! ;)

@lhoestq lhoestq marked this pull request as ready for review June 23, 2022 15:51
@lhoestq lhoestq requested a review from albertvillanova June 23, 2022 16:50
Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Good catch! ;)

@@ -783,7 +783,7 @@ def get_module(self) -> DatasetModule:
hfh_dataset_info = HfApi(config.HF_ENDPOINT).dataset_info(
self.name,
revision=self.revision,
token=token,
token=token if token else "no-token",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If HFH maintainers prefer "no-token", then this is OK! ;)

@lhoestq lhoestq merged commit aa555a2 into master Jun 28, 2022
@lhoestq lhoestq deleted the properly-raise-filenotfound-even-if-private-repo branch June 28, 2022 10:36
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants