Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

"no-token" is passed to huggingface_hub when token is None #4990

Closed
Wauplin opened this issue Sep 19, 2022 · 6 comments · Fixed by #5031
Closed

"no-token" is passed to huggingface_hub when token is None #4990

Wauplin opened this issue Sep 19, 2022 · 6 comments · Fixed by #5031
Assignees
Labels
bug Something isn't working

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Sep 19, 2022

Describe the bug

In the 2 lines listed below, a token is passed to huggingface_hub to get information from a dataset. If no token is provided, a "no-token" string is passed. What is the purpose of it ? If no real, I would prefer if the None value could be sent directly to be handle by huggingface_hub. I feel that here it is working because we assume the token will never be validated.

token=token if token else "no-token",

token=token if token else "no-token",

Expected results

Pass token=None to huggingface_hub.

Actual results

token="no-token" is passed.

Environment info

huggingface_hub v0.10.0dev

@albertvillanova
Copy link
Member

Hi @Wauplin, thanks for raising this potential issue.

The choice of passing "no-token" instead of None was made in this PR:

According to the PR description, the reason why it is passed is to avoid that HfApi.dataset_info uses the local token when no token should be used.

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 20, 2022

Hi @albertvillanova , thanks for finding the original issue 👍

As of next release of huggingface_hub, the token argument will be deprecated in favor of the use_auth_token argument in dataset_info method. This change as been done by @SBrandeis in huggingface/huggingface_hub#928. use_auth_token is a bit different and allow the case "don't sent the cached token by default".

If you want to strictly avoid sending the cached token from datasets, you can use:

# token=token if token else "no-token", <- will fail because token is not valid

use_auth_token=token if token else False,  # using the new `use_auth_token` parameter

And as a note, I am currently updating the "don't send the cached token by default"-rule to "don't send the cached token on public repos by default but use it in private ones" in huggingface/huggingface_hub#1064. This will not change the fact that use_auth_token=False doesn't send the token at all.

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 20, 2022

What is current strategy in term of updating huggingface_hub version in datasets ? I don't want to break stuff in the next release so let's find a proper solution :)

@lhoestq
Copy link
Member

lhoestq commented Sep 20, 2022

As soon as token is deprecated and hfh has a new release, we'll update datasets to use the new argument instead. Does it sound good to you ?

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 20, 2022

Perfect 👌

@albertvillanova
Copy link
Member

Hi @Wauplin, thanks for the warning about the deprecation of token in favor of use_auth_token.

Indeed, in datasets we use internally use_auth_token, which in this case was transformed to token to call HfApi.dataset_info:

token = HfFolder.get_token() if self.download_config.use_auth_token else None

Therefore, for the new hfh release, the fix will be trivial: we will pass directly use_auth_token.

As discussed during our meeting yesterday, due to the fact that at datasets we support multiple hfh versions, I think we should handle passing token or use_auth_token depending on the hfh version.

@lhoestq lhoestq self-assigned this Sep 26, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants