Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

hf_hub_download and cached_download should read the token by default #926

Closed
osanseviero opened this issue Jun 23, 2022 · 5 comments · Fixed by #1064
Closed

hf_hub_download and cached_download should read the token by default #926

osanseviero opened this issue Jun 23, 2022 · 5 comments · Fixed by #1064
Labels
bug Something isn't working

Comments

@osanseviero
Copy link
Contributor

Describe the bug

Right now, hf_hub_download does not work for private repos unless you set use_auth_token=True. This is inconsistent with other methods in which the token is automatically retrieved if not specified. I think we should change

if isinstance(use_auth_token, str):
headers["authorization"] = f"Bearer {use_auth_token}"
elif use_auth_token:
token = HfFolder.get_token()
if token is None:
raise EnvironmentError(
"You specified use_auth_token=True, but a huggingface token was not"
" found."
)
headers["authorization"] = f"Bearer {token}"
to always get the token if it's not explicitly passed.

WDYT @LysandreJik @julien-c?

Reproduction

No response

Logs

No response

System Info

-
@osanseviero osanseviero added the bug Something isn't working label Jun 23, 2022
@julien-c
Copy link
Member

we've discussed it in the past and decided against it for privacy protection reasons (especially in the case when it's used from third party libraries – the hf.co server will "know" what models user A is downloading without explicitly opting-in)

But maybe it does make sense to revisit this

@osanseviero
Copy link
Contributor Author

That's the thing, a user A that wants to download a private model will need to explicitly opt-in. I'm not sure how the current setup guarantees more privacy protection that the other approach. My suggestion is to change

if isinstance(use_auth_token, str): 
     headers["authorization"] = f"Bearer {use_auth_token}" 
 elif use_auth_token: 
     token = HfFolder.get_token() 
     if token is None: 
         raise EnvironmentError( 
             "You specified use_auth_token=True, but a huggingface token was not" 
             " found." 
         ) 

to

if isinstance(use_auth_token, str): 
     headers["authorization"] = f"Bearer {use_auth_token}" 

token = HfFolder.get_token() 
if token is None: 
     raise EnvironmentError( 
         "You specified use_auth_token=True, but a huggingface token was not" 
         " found." 
     ) 

Note that the upload methods already do this automatically (as in

token, name = self._validate_or_retrieve_token(token)
), so not sure why the download should be different.

@julien-c
Copy link
Member

not sure why the download should be different

Upload methods aren't expected by the user to even work w/o authentication, but download methods are (for public repos anyways)

I will try to find the past discussions of this and link them here, for completeness.

But note that I'm not opposed to changing this behavior, just need to be aware of privacy implications

@julien-c
Copy link
Member

one example of internal convo but earlier discussions were probably more interesting

@julien-c
Copy link
Member

julien-c commented Jun 24, 2022

@sgugger pointed me to this transformers PR for the original discussion huggingface/transformers#9141

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants