Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Store cached models in location compatible with huggingface-cli #1663

Closed
pcuenca opened this issue Dec 12, 2022 · 21 comments · Fixed by #2005
Closed

Store cached models in location compatible with huggingface-cli #1663

pcuenca opened this issue Dec 12, 2022 · 21 comments · Fixed by #2005
Assignees
Labels

Comments

@pcuenca
Copy link
Member

pcuenca commented Dec 12, 2022

Reference: huggingface/huggingface_hub#1259

huggingface-cli scan-cache doesn't see cached diffusers models. Are there any drawbacks to changing the cache folder?

@patrickvonplaten
Copy link
Contributor

Yes we could indeed change the cache here! This was a copy-paste back then from the transformers repo! Would you like to open a PR to change the default cache @pcuenca ? Otherwise happy to look into it in a couple of days.

@patrickvonplaten patrickvonplaten self-assigned this Dec 12, 2022
@Wauplin
Copy link
Collaborator

Wauplin commented Dec 12, 2022

I support the idea to get everything under $HF_HOME/hub/ so that users can benefit from huggingface_hub tool. Also beneficial if a user downloads some weights outside of diffusers for some reason, the cache will be shared. This is not the case at the moment.

Also note that if there are some stuff cached but not from the Hub (preprocessed data, community weights,...) it would make sense to use assets cache (see docs). I can help with that if needed :)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jan 11, 2023
@patrickvonplaten patrickvonplaten removed the stale Issues that haven't received updates label Jan 12, 2023
@patrickvonplaten
Copy link
Contributor

cc @pcuenca would you like to tackle this one? Otherwise, I should be able to find some time

@pcuenca
Copy link
Member Author

pcuenca commented Jan 13, 2023

Yeah, I'll take a look today.

@github-actions github-actions bot added the stale Issues that haven't received updates label Feb 6, 2023
@pcuenca pcuenca removed the stale Issues that haven't received updates label Feb 6, 2023
@huggingface huggingface deleted a comment from github-actions bot Feb 16, 2023
@patrickvonplaten
Copy link
Contributor

Think this is still relevant

@alexcoca
Copy link

alexcoca commented May 9, 2023

@patrickvonplaten is there an elegant mechanism for loading models that have been cached before? For example, something happened with the hub https://twitter.com/huggingface/status/1655760648926642178 and the normal .from_pretrained machinery does not work. One can of course always assume the worst and copy the model assets from the hugging face once downloaded and then back off to these assets if there are issues at the huggingface. Would it not be ideal if from_pretrained looked to see if the model assets are in .cache and load them, as opposed to throwing an OSError? Is this even possible (haven't looked into how the cache really works and so on)?

@Wauplin
Copy link
Collaborator

Wauplin commented May 9, 2023

@alexcoca In general .from_pretrained will check if a new version of the model exists on the Hub. At the moment HF Hub is having some trouble resolving this (we are working hard on fixing it!). In case you don't want to depend on the Hub at all, you can pass local_files_only=True. Of course, weights must have been downloaded once.

@alexcoca
Copy link

alexcoca commented May 9, 2023

@Wauplin , this is a great idea and I had tried it but passing the local_files_only=True to the from_pretrained call above doesn't solve the issue. There is a cache containing all the assets, but then that's buried into something along the lines

/Users/alexandrucoca/.cache/huggingface/hub/models--sentence-transformers--all-distilroberta-v1/snapshots/57dd5d5be528ba968ef928103d92f95afc487e68

There should be an easy way to default to using an existing cache and I imagine that is what local_files_only is for. Why does it not work in my case?

I can of course pass that cache path myself as the model_name_or_path to solve the issue, but should not local_files_only do this for us?

@Wauplin
Copy link
Collaborator

Wauplin commented May 9, 2023

Why does it not work in my case?

🤔
Can you provide the exact snippet of code you are using here?

@alexcoca
Copy link

alexcoca commented May 9, 2023

        self._tokenizer = AutoTokenizer.from_pretrained(model_name, local_files_only=True)
        self._model = AutoModel.from_pretrained(model_name, local_files_only=True)

with model_name="sentence-transformers/all-distilroberta-v1"

@alexcoca
Copy link

alexcoca commented May 9, 2023

like I can't see how this would work ... the code ends up in configuration_utils.py at L628... and then we later end in hub.py ... It seemed there may have been a hope at L393 as that tries to resolve the model name by calling try_to_load_from_cache but that is not called because _commit_hash is None. Then I end up calling hf_hub_download which fails.

@Wauplin , I did some further stepping through the code, some unexpected findings:

  • in file_download.py after L1229 - 1236, commit_hash is None because L1233 fails to detect ref_path as a file. Although it is a file that just has a commit hash, which happens to be what is after snapshots in the URL I linked above.
  • even if the commit_hash would have been resolved correctly, pointer_path is wrong ('/Users/alexandrucoca/.cache/huggingface/hub/models--sentence-transformers--all-distilroberta-v1/models--sentence-transformers--all-distilroberta-v1/snapshots/57dd5d5be528ba968ef928103d92f95afc487e68/config.json' - note how models--- repeats twice).

Why is this the case?

Update: turns out this is because of setting HUGGINGFACE_HUB_CACHE wrong.

@Wauplin
Copy link
Collaborator

Wauplin commented May 9, 2023

@alexcoca given the update, does it mean this is now solved on your side? Meaning local_files_only=True works no matter the connection/Hub status?

@alexcoca
Copy link

alexcoca commented May 9, 2023

@Wauplin, yes, worked for me because I had the cached files handy 👍 :)

@alexblattner
Copy link

Is it possible to choose the location of the cache?

@Wauplin
Copy link
Collaborator

Wauplin commented Jul 18, 2023

@alexblattner Yes you can configure it by setting HF_HOME or HUGGINGFACE_HUB_CACHE environment variables. See this reference docs for more details.

@alexblattner
Copy link

alexblattner commented Jul 18, 2023

@Wauplin is there a way to do that without setting the environment variable? I essentially want to be able to store some loras on external disk A and some in external disk B. Changing the environment variable all the time seems wrong to me.

of course, these external disks are connected

@Wauplin
Copy link
Collaborator

Wauplin commented Jul 18, 2023

@alexblattner Then cache_dir parameter is the way to go :) Environment variables are convenient in most cases as most users expect their cache to be on a single hard drive.

@alexblattner
Copy link

@Wauplin thanks a lot! could you give me very basic diffusers example that uses that? thanks in advance!

@Wauplin
Copy link
Collaborator

Wauplin commented Jul 19, 2023

could you give me very basic diffusers example that uses that

@alexblattner Something like this:

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", cache_dir="path/to/cache")

should work :)

@alexblattner
Copy link

@Wauplin thanks a lot!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants