Store cached models in location compatible with `huggingface-cli` #1663

pcuenca · 2022-12-12T10:49:28Z

Reference: huggingface/huggingface_hub#1259

huggingface-cli scan-cache doesn't see cached diffusers models. Are there any drawbacks to changing the cache folder?

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2022-12-12T11:18:03Z

Yes we could indeed change the cache here! This was a copy-paste back then from the transformers repo! Would you like to open a PR to change the default cache @pcuenca ? Otherwise happy to look into it in a couple of days.

Wauplin · 2022-12-12T11:19:49Z

I support the idea to get everything under $HF_HOME/hub/ so that users can benefit from huggingface_hub tool. Also beneficial if a user downloads some weights outside of diffusers for some reason, the cache will be shared. This is not the case at the moment.

Also note that if there are some stuff cached but not from the Hub (preprocessed data, community weights,...) it would make sense to use assets cache (see docs). I can help with that if needed :)

github-actions · 2023-01-11T15:04:40Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten · 2023-01-12T17:32:27Z

cc @pcuenca would you like to tackle this one? Otherwise, I should be able to find some time

pcuenca · 2023-01-13T14:07:43Z

Yeah, I'll take a look today.

patrickvonplaten · 2023-02-16T13:42:18Z

Think this is still relevant

alexcoca · 2023-05-09T10:44:35Z

@patrickvonplaten is there an elegant mechanism for loading models that have been cached before? For example, something happened with the hub https://twitter.com/huggingface/status/1655760648926642178 and the normal .from_pretrained machinery does not work. One can of course always assume the worst and copy the model assets from the hugging face once downloaded and then back off to these assets if there are issues at the huggingface. Would it not be ideal if from_pretrained looked to see if the model assets are in .cache and load them, as opposed to throwing an OSError? Is this even possible (haven't looked into how the cache really works and so on)?

Wauplin · 2023-05-09T10:50:14Z

@alexcoca In general .from_pretrained will check if a new version of the model exists on the Hub. At the moment HF Hub is having some trouble resolving this (we are working hard on fixing it!). In case you don't want to depend on the Hub at all, you can pass local_files_only=True. Of course, weights must have been downloaded once.

alexcoca · 2023-05-09T11:21:34Z

@Wauplin , this is a great idea and I had tried it but passing the local_files_only=True to the from_pretrained call above doesn't solve the issue. There is a cache containing all the assets, but then that's buried into something along the lines

/Users/alexandrucoca/.cache/huggingface/hub/models--sentence-transformers--all-distilroberta-v1/snapshots/57dd5d5be528ba968ef928103d92f95afc487e68

There should be an easy way to default to using an existing cache and I imagine that is what local_files_only is for. Why does it not work in my case?

I can of course pass that cache path myself as the model_name_or_path to solve the issue, but should not local_files_only do this for us?

Wauplin · 2023-05-09T11:23:59Z

Why does it not work in my case?

🤔
Can you provide the exact snippet of code you are using here?

alexcoca · 2023-05-09T11:25:23Z

        self._tokenizer = AutoTokenizer.from_pretrained(model_name, local_files_only=True)
        self._model = AutoModel.from_pretrained(model_name, local_files_only=True)

with model_name="sentence-transformers/all-distilroberta-v1"

alexcoca · 2023-05-09T11:41:40Z

like I can't see how this would work ... the code ends up in configuration_utils.py at L628... and then we later end in hub.py ... It seemed there may have been a hope at L393 as that tries to resolve the model name by calling try_to_load_from_cache but that is not called because _commit_hash is None. Then I end up calling hf_hub_download which fails.

@Wauplin , I did some further stepping through the code, some unexpected findings:

in file_download.py after L1229 - 1236, commit_hash is None because L1233 fails to detect ref_path as a file. Although it is a file that just has a commit hash, which happens to be what is after snapshots in the URL I linked above.
even if the commit_hash would have been resolved correctly, pointer_path is wrong ('/Users/alexandrucoca/.cache/huggingface/hub/models--sentence-transformers--all-distilroberta-v1/models--sentence-transformers--all-distilroberta-v1/snapshots/57dd5d5be528ba968ef928103d92f95afc487e68/config.json' - note how models--- repeats twice).

Why is this the case?

Update: turns out this is because of setting HUGGINGFACE_HUB_CACHE wrong.

Wauplin · 2023-05-09T17:05:23Z

@alexcoca given the update, does it mean this is now solved on your side? Meaning local_files_only=True works no matter the connection/Hub status?

alexcoca · 2023-05-09T18:04:07Z

@Wauplin, yes, worked for me because I had the cached files handy 👍 :)

alexblattner · 2023-07-18T13:24:55Z

Is it possible to choose the location of the cache?

Wauplin · 2023-07-18T13:38:28Z

@alexblattner Yes you can configure it by setting HF_HOME or HUGGINGFACE_HUB_CACHE environment variables. See this reference docs for more details.

alexblattner · 2023-07-18T14:08:09Z

@Wauplin is there a way to do that without setting the environment variable? I essentially want to be able to store some loras on external disk A and some in external disk B. Changing the environment variable all the time seems wrong to me.

of course, these external disks are connected

Wauplin · 2023-07-18T14:43:29Z

@alexblattner Then cache_dir parameter is the way to go :) Environment variables are convenient in most cases as most users expect their cache to be on a single hard drive.

alexblattner · 2023-07-18T15:00:27Z

@Wauplin thanks a lot! could you give me very basic diffusers example that uses that? thanks in advance!

Wauplin · 2023-07-19T08:31:29Z

could you give me very basic diffusers example that uses that

@alexblattner Something like this:

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", cache_dir="path/to/cache")

should work :)

alexblattner · 2023-07-19T10:40:19Z

@Wauplin thanks a lot!

pcuenca mentioned this issue Dec 12, 2022

models downloaded by diffusers 0.10 do not appear in scan-cache huggingface/huggingface_hub#1259

Closed

patrickvonplaten self-assigned this Dec 12, 2022

github-actions bot added the stale Issues that haven't received updates label Jan 11, 2023

patrickvonplaten removed the stale Issues that haven't received updates label Jan 12, 2023

patrickvonplaten assigned pcuenca Jan 12, 2023

pcuenca mentioned this issue Jan 16, 2023

Use "hub" directory for cache instead of "diffusers" #2005

Merged

github-actions bot added the stale Issues that haven't received updates label Feb 6, 2023

pcuenca removed the stale Issues that haven't received updates label Feb 6, 2023

github-actions bot closed this as completed Feb 15, 2023

huggingface deleted a comment from github-actions bot Feb 16, 2023

patrickvonplaten reopened this Feb 16, 2023

patrickvonplaten added the wip label Feb 16, 2023

pcuenca closed this as completed in #2005 Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store cached models in location compatible with `huggingface-cli` #1663

Store cached models in location compatible with `huggingface-cli` #1663

pcuenca commented Dec 12, 2022

patrickvonplaten commented Dec 12, 2022

Wauplin commented Dec 12, 2022 •

edited

Loading

github-actions bot commented Jan 11, 2023

patrickvonplaten commented Jan 12, 2023

pcuenca commented Jan 13, 2023

patrickvonplaten commented Feb 16, 2023

alexcoca commented May 9, 2023

Wauplin commented May 9, 2023

alexcoca commented May 9, 2023 •

edited

Loading

Wauplin commented May 9, 2023 •

edited

Loading

alexcoca commented May 9, 2023

alexcoca commented May 9, 2023 •

edited

Loading

Wauplin commented May 9, 2023

alexcoca commented May 9, 2023

alexblattner commented Jul 18, 2023

Wauplin commented Jul 18, 2023

alexblattner commented Jul 18, 2023 •

edited

Loading

Wauplin commented Jul 18, 2023

alexblattner commented Jul 18, 2023

Wauplin commented Jul 19, 2023 •

edited

Loading

alexblattner commented Jul 19, 2023

Store cached models in location compatible with huggingface-cli #1663

Store cached models in location compatible with huggingface-cli #1663

Comments

pcuenca commented Dec 12, 2022

patrickvonplaten commented Dec 12, 2022

Wauplin commented Dec 12, 2022 • edited Loading

github-actions bot commented Jan 11, 2023

patrickvonplaten commented Jan 12, 2023

pcuenca commented Jan 13, 2023

patrickvonplaten commented Feb 16, 2023

alexcoca commented May 9, 2023

Wauplin commented May 9, 2023

alexcoca commented May 9, 2023 • edited Loading

Wauplin commented May 9, 2023 • edited Loading

alexcoca commented May 9, 2023

alexcoca commented May 9, 2023 • edited Loading

Wauplin commented May 9, 2023

alexcoca commented May 9, 2023

alexblattner commented Jul 18, 2023

Wauplin commented Jul 18, 2023

alexblattner commented Jul 18, 2023 • edited Loading

Wauplin commented Jul 18, 2023

alexblattner commented Jul 18, 2023

Wauplin commented Jul 19, 2023 • edited Loading

alexblattner commented Jul 19, 2023

Store cached models in location compatible with `huggingface-cli` #1663

Store cached models in location compatible with `huggingface-cli` #1663

Wauplin commented Dec 12, 2022 •

edited

Loading

alexcoca commented May 9, 2023 •

edited

Loading

Wauplin commented May 9, 2023 •

edited

Loading

alexcoca commented May 9, 2023 •

edited

Loading

alexblattner commented Jul 18, 2023 •

edited

Loading

Wauplin commented Jul 19, 2023 •

edited

Loading