Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

1025 add time details to scan cache #1045

Merged
merged 8 commits into from
Sep 13, 2022
Merged

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Sep 12, 2022

Add last_modified and last_accessed info to the cache scanner. Scanned as float values (timestamp) and displayed as "timesince" string (2 weeks ago).

This is a preliminary work for the #1025.

Implementation is based on python's os.stat function and applied only to the blob files themselves and not the symlinks (that are more platform-dependent).

REPO ID                       REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS            LOCAL PATH                                                                   
----------------------------- --------- ------------ -------- ------------- ------------- --------------- ---------------------------------------------------------------------------- 
chrisjay/crowd-speech-africa  dataset         761.7M     4269 4 days ago    4 days ago    main            /home/wauplin/.cache/huggingface/hub/datasets--chrisjay--crowd-speech-africa 
oscar                         dataset           3.3M        3 3 days ago    3 days ago    main            /home/wauplin/.cache/huggingface/hub/datasets--oscar                         
wikiann                       dataset         804.1K      180 1 week ago    1 week ago    main            /home/wauplin/.cache/huggingface/hub/datasets--wikiann                       
z-uo/male-LJSpeech-italian    dataset           5.5G     9409 4 days ago    4 days ago    main            /home/wauplin/.cache/huggingface/hub/datasets--z-uo--male-LJSpeech-italian   
chrisjay/crowd-speech-africa  model             1.2K        1 4 days ago    4 days ago    main            /home/wauplin/.cache/huggingface/hub/models--chrisjay--crowd-speech-africa   
datasets/lhoestq/custom_squad model             4.9M        2 2 weeks ago   2 weeks ago                   /home/wauplin/.cache/huggingface/hub/models--datasets--lhoestq--custom_squad 
gpt2                          model             3.1G       14 1 week ago    1 week ago    main, refs/pr/1 /home/wauplin/.cache/huggingface/hub/models--gpt2                            
osanseviero/flair_test3       model            714.0        1 2 weeks ago   2 weeks ago   main            /home/wauplin/.cache/huggingface/hub/models--osanseviero--flair_test3        

Done in 1.1s. Scanned 8 repo(s) for a total of 9.4G.
Got 2875 warning(s) while scanning. Use -vvv to print details.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 12, 2022

The documentation is not available anymore as the PR was closed or merged.

@codecov
Copy link

codecov bot commented Sep 12, 2022

Codecov Report

Merging #1045 (56c4187) into main (b9d8617) will increase coverage by 0.07%.
The diff coverage is 93.93%.

❗ Current head 56c4187 differs from pull request most recent head cb8e347. Consider uploading reports for the commit cb8e347 to get more accurate results

@@            Coverage Diff             @@
##             main    #1045      +/-   ##
==========================================
+ Coverage   83.69%   83.76%   +0.07%     
==========================================
  Files          37       37              
  Lines        3919     3949      +30     
==========================================
+ Hits         3280     3308      +28     
- Misses        639      641       +2     
Impacted Files Coverage Δ
src/huggingface_hub/commands/cache.py 82.35% <ø> (ø)
src/huggingface_hub/utils/_cache_manager.py 95.61% <93.93%> (-0.35%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thanks!

Nitpick: It ends up taking a bit of place in my terminal, I wonder if we shouldn't put it behind a verbose flag?

t5-small model 970.7M 11 refs/pr/1, main /Users/lucain/.cache/huggingface/hub/models--t5-small
REPO ID REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH
--------------------------- --------- ------------ -------- ------------- ------------- ------------------- -------------------------------------------------------------------------
glue dataset 116.3K 15 4 days ago 4 days ago 2.4.0, main, 1.17.0 /home/wauplin/.cache/huggingface/hub/datasets--glue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you switch to Linux in between PRs? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahah, yes 😁

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 13, 2022

@LysandreJik thanks for the review !

About the nitpick, I agree that it's not optimal. There is another issue (#1024) to add options to this command to have a csv-like output/to select columns. I think it makes sense to address that in a following PR (by default I would also remove the "LOCAL PATH" column that takes far too much space and can be very long)

@Wauplin Wauplin merged commit b35f817 into main Sep 13, 2022
@Wauplin Wauplin deleted the 1025-add-time-details-to-scan-cache branch September 13, 2022 15:46
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants