Releases: huggingface/huggingface_hub
Patch release v0.11.1
Hot-fix to fix permission issues when downloading with hf_hub_download
or snapshot_download
. For more details, see #1220, #1141 and #1215.
Full changelog: v0.11.0...v0.11.1
Extended HfApi, pagination, simplified login and more
New features and improvements for HfApi
HfApi
is the central point to interact with the Hub API (manage repos, create commits,...). The goal is to propose more and more git-related features using HTTP endpoints to allow users to interact with the Hub without cloning locally a repo.
Create/delete tags and branches
from huggingface_hub import create_branch, create_tag, delete_branch, delete_tag
create_tag(repo_id, tag="v0.11", tag_message="Release v0.11")
delete_tag(repo_id, tag="something") # If you created a tag by mistake
create_branch(repo_id, branch="experiment-154")
delete_branch(repo_id, branch="experiment-1") # Clean some old branches
- Add a
create_tag
method to create tags from the HTTP endpoint by @Wauplin in #1089 - Add
delete_tag
method toHfApi
by @Wauplin in #1128 - Create tag twice doesn't work by @Wauplin in #1149
- Add "create_branch" and "delete_branch" endpoints by @Wauplin #1181
Upload lots of files in a single commit
Making a very large commit was previously tedious. Files are now processed by chunks which makes it possible to upload 25k files in a single commit (and 1Gb payload limitation if uploading only non-LFS files). This should make it easier to upload large datasets.
- Create commit by streaming a ndjson payload (allow lots of file in single commit) by @Wauplin in #1117
Delete an entire folder
from huggingface_hub import CommitOperationDelete, create_commit, delete_folder
# Delete a single folder
delete_folder(repo_id=repo_id, path_in_repo="logs/")
# Alternatively, use the low-level `create_commit`
create_commit(
repo_id,
operations=[
CommitOperationDelete(path_in_repo="old_config.json") # Delete a file
CommitOperationDelete(path_in_repo="logs/") # Delete a folder
],
commit_message=...,
)
Support pagination when listing repos
In the future, listing models, datasets and spaces will be paginated on the Hub by default. To avoid breaking changes, huggingface_hub
follows already pagination. Output type is currently a list (deprecated), will become a generator in v0.14
.
- Add support for pagination in list_models list_datasets and list_spaces by @Wauplin #1176
- Deprecate output in list_models by @Wauplin in #1143
Misc
- Allow create PR against non-main branch by @Wauplin in #1168
- 1162 Reorder operations correctly in commit endpoint by @Wauplin in #1175
Login, tokens and authentication
Authentication has been revisited to make it as easy as possible for the users.
Unified login
and logout
methods
from huggingface_hub import login, logout
# `login` detects automatically if you are running in a notebook or a script
# Launch widgets or TUI accordingly
login()
# Now possible to login with a hardcoded token (non-blocking)
login(token="hf_***")
# If you want to bypass the auto-detection of `login`
notebook_login() # still available
interpreter_login() # to login from a script
# Logout programmatically
logout()
# Still possible to login from CLI
huggingface-cli login
Set token only for a HfApi
session
from huggingface_hub import HfApi
# Token will be sent in every request but not stored on machine
api = HfApi(token="hf_***")
Stop using use_auth_token
in favor of token
, everywhere
token
parameter can now be passed to every method in huggingface_hub
. use_auth_token
is still accepted where it previously existed but the mid-term goal (~6 months) is to deprecate and remove it.
Respect git credential helper from the user
Previously, token was stored in the git credential store
. Can now be in any helper configured by the user -keychain, cache,...-.
Better error handling
Helper to dump machine information
# Dump all relevant information. To be used when reporting an issue.
➜ huggingface-cli env
Copy-and-paste the text below in your GitHub issue.
- huggingface_hub version: 0.11.0.dev0
- Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
- Python version: 3.10.6
...
Misc
- Cache not found is not an error by @singingwolfboy in #1101
- Propagate error messages when multiple on BadRequest by @Wauplin in #1115
- Add error message from
x-error-message
header if exists by @Wauplin in #1121
Modelcards
Few improvements/fixes in the modelcard module:
- 🎨 make repocard content a property by @nateraw in #1147
- ✅ fix content string in repocard tests by @nateraw in #1155
- Add Hub verification token to evaluation metadata by @lewtun in #1142
- Use default
model_name
inmetadata_update
by @lvwerra in #1157 - Refer to modelcard creator app from doc by @Wauplin in #1184
- Parent Model --> Finetuned from model by @meg-huggingface #1191
- FIX overwriting metadata when both verified and unverified reported values by @Wauplin in #1186
Cache assets
New feature to provide a path in the cache where any downstream library can store assets (processed data, files from the web, extracted data, rendered images,...)
Documentation updates
- Fixing a typo in the doc. by @Narsil in #1113
- Fix docstring of list_datasets by @albertvillanova in #1125
- Add repo_type=dataset possibility to guide by @Wauplin in #1134
- Fix PyTorch & Keras mixin doc by @lewtun in #1139
- Update how-to-manage.mdx by @severo in #1150
- Typo fix by @meg-huggingface in #1166
- Adds link to model card metadata spec by @meg-huggingface in #1171
- Removing "Related Models" & just asking for "Parent Model" by @meg-huggingface in #1178
Breaking changes
- Cannot provide an organization to
create_repo
identical_ok
removed inupload_file
- Breaking changes in arguments for
validate_preupload_info
,prepare_commit_payload
,_upload_lfs_object
(internal helpers for the commit API) huggingface_hub.snapshot_download
is not exposed as a public module anymore
Deprecations
- Remove deprecated code from v0.9, v0.10 and v0.11 by @Wauplin in #1092
- Rename languages to langage + remove duplicate code in tests by @Wauplin in #1169
- Deprecate output in list_models by @Wauplin in #1143
- Set back feature to create a repo when using clone_from by @Wauplin in #1187
Internal
- Configure pytest to run on staging by default + flags in config by @Wauplin in #1093
- fix search models test by @Wauplin in #1106
- Add mypy in the CI (and fix existing type issues) by @Wauplin in #1097
- Fix deprecation warnings for assertEquals in tests by @Wauplin in #1135
- Skip failing test in ci by @Wauplin in #1148
- 💚 fix mypy ci by @nateraw in #1167
- Update pr docs actions by @mishig25 in #1170
- Revert "Update pr docs actions" by @mishig25 #1192
Bugfixes & small improvements
- Expose list_spaces by @osanseviero in #1132
- respect NO_COLOR env var by @singingwolfboy in #1103
- Fix list_models bool parameters by @Wauplin in #1152
- FIX url encoding in hf_hub_url by @Wauplin in #1164
- Fix cannot create pr on foreign repo by @Wauplin #1183
- Fix
HfApi.move_repo(...)
and complete tests by @Wauplin in #1136 - Commit empty files as regular and warn user by @Wauplin in #1180
- Parse file size in get_hf_file_metadata by @Wauplin #1179
- Fix get file size on lfs by @Wauplin #1188
- More robust create relative symlink in cache by @Wauplin in #1109
- Test running CI on Python 3.11 #1189
Patch release v0.10.1
Hot-fix to force utf-8 encoding in modelcards. See #1102 and skops-dev/skops#162 (comment) for context.
Full Changelog: v0.10.0...v0.10.1
v0.10.0: Modelcards, cache management and more
Modelcards
Contribution from @nateraw to integrate the work done on Modelcards and DatasetCards (from nateraw/modelcards) directly in huggingface_hub
.
>>> from huggingface_hub import ModelCard
>>> card = ModelCard.load('nateraw/vit-base-beans')
>>> card.data.to_dict()
{'language': 'en', 'license': 'apache-2.0', 'tags': ['generated_from_trainer', 'image-classification'],...}
Related commits
- Add additional repo card utils from
modelcards
repo by @nateraw in #940 - Add regression test for empty modelcard update by @Wauplin in #1060
- Add template variables to dataset card template by @nateraw in #1068
- Further clarifying Model Card sections by @meg-huggingface in #1052
- Create modelcard if doesn't exist on
update_metadata
by @Wauplin in #1061
Related documentation
Cache management (huggingface-cli scan-cache
and huggingface-cli delete-cache
)
New commands in huggingface-cli
to scan and delete parts of the cache. Goal is to manage the cache-system the same way for any dependent library that uses huggingface_hub
. Only the new cache-system format is supported.
➜ huggingface-cli scan-cache
REPO ID REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH
--------------------------- --------- ------------ -------- ------------- ------------- ------------------- -------------------------------------------------------------------------
glue dataset 116.3K 15 4 days ago 4 days ago 2.4.0, main, 1.17.0 /home/wauplin/.cache/huggingface/hub/datasets--glue
google/fleurs dataset 64.9M 6 1 week ago 1 week ago refs/pr/1, main /home/wauplin/.cache/
(...)
Done in 0.0s. Scanned 6 repo(s) for a total of 3.4G.
Got 1 warning(s) while scanning. Use -vvv to print details.
Related commits
- Feature: add an utility to scan cache by @Wauplin in #990
- Utility to delete revisions by @Wauplin in #1035
- 1025 add time details to scan cache by @Wauplin in #1045
- Fix scan cache failure when cached snapshot is empty by @Wauplin in #1054
- 1025
huggingface-cli delete-cache
command by @Wauplin in #1046 - Sort repos/revisions by age in
delete-cache
by @Wauplin in #1063
Related documentation
Better error handling (and http-related stuff)
HTTP calls to the Hub have been harmonized to behave the same across the library.
Major differences are:
- Unified way to handle HTTP errors using
hf_raise_for_status
(more informative error message) - Auth token is always sent by default when a user is logged in (see documentation).
- package versions are sent as user-agent header for telemetry (python, huggingface_hub, tensorflow, torch,...). It was already the case for
hf_hub_download
.
Related commits
- Always send the cached token when user is logged in by @Wauplin in #1064
- Add user agent to all requests with huggingface_hub version (and other) by @Wauplin in #1075
- [Repository] Add better error message by @patrickvonplaten in #993
- Clearer HTTP error messages in
huggingface_hub
by @Wauplin in #1019 - Handle backoff on HTTP 503 error when pushing repeatedly by @Wauplin in #1038
Breaking changes
- For consistency, the return type of
create_commit
has been modified. This is a breaking change, but we hope the return type of this method was never used (quite recent and niche output type).
- Since
repo_id
is now validated using@validate_hf_hub_args
(see below), a breaking change can be caused ifrepo_id
was previously miused. AHFValidationError
is now raised ifrepo_id
is not valid.
Miscellaneous improvements
Add support for autocomplete
http-based push_to_hub_fastai
- Add changes for push_to_hub_fastai to use the new http-based approach. by @nandwalritik in #1040
Check if a file is cached
Get file metadata (commit hash, etag, location) without downloading
Validate arguments using @validate_hf_hub_args
- Add validator for repo id + decorator to validate arguments in
huggingface_hub
by @Wauplin in #1029 - Remove repo_id validation in hf_hub_url and hf_hub_download by @Wauplin in #1031
repo_id
was previously misused
Related documentation:
Documentation updates
- Fix raise syntax: remove markdown bullet point by @mishig25 in #1034
- docs render tree correctly by @mishig25 in #1070
Deprecations
- ENH Deprecate clone_from behavior by @merveenoyan in #952
- 🗑 Deprecate
token
in read-only methods ofHfApi
in favor ofuse_auth_token
by @SBrandeis in #928 - Remove legacy helper 'install_lfs_in_userspace' by @Wauplin in #1059
- 1055 deprecate private and repo type in repository class by @Wauplin in #1057
Bugfixes & small improvements
- Consider empty subfolder as None in hf_hub_url and hf_hub_download by @Wauplin in #1021
- enable http request retry under proxy by @MrZhengXin in #1022
- Add securityStatus to ModelInfo object with default value None. by @Wauplin in #1026
- 👽️ Add size parameter for lfsFiles when committing on the hub by @coyotte508 in #1048
- Use
/models/
path for api call to update settings by @Wauplin in #1049 - Globally set git credential.helper to
store
in google colab by @Wauplin in #1053 - FIX notebook login by @Wauplin in #1073
Windows-specific bug fixes
Patch release v0.9.1
Hot-fix error message on gated repositories (#1015).
Context: https://huggingface.co/CompVis/stable-diffusion-v1-4 has been widely shared in the last days but since it's a gated-repo, lots of users are getting confused by the Authentification error received. Error message is now more detailed.
Full Changelog: v0.9.0...v0.9.1
v0.9.0: Community API and new `push_to_hub` mixins
Community API
Huge work to programmatically interact with the community tab, thanks to @SBrandeis !
It is now possible to:
- Manage discussions (
create_discussion
,create_pull_request
,merge_pull_request
,change_discussion_status
,rename_discussion
) - Comment on them (
comment_discussion
,edit_discussion_comment
) - List them (
get_repo_discussions
,get_discussion_details
)
See full documentation for more details.
- ✨ Programmatic API for the community tab by @SBrandeis in #930
HTTP-based push_to_hub
mixins
push_to_hub
mixin and push_to_hub_keras
have been refactored to leverage the http-endpoint. This means pushing to the hub will no longer require to first download the repo locally. Previous git-based version is planned to be supported until v0.12.
- Push to hub mixins that do not leverage
git
by @LysandreJik in #847
Miscellaneous API improvements
- ✨
parent_commit
argument forcreate_commit
and related functions by @SBrandeis in #916 - Add a helpful error message when commit_message is empty in create_commit by @sgugger in #962
- ✨ create_commit: more user-friendly errors on HTTP 400 by @SBrandeis in #963
- ✨ Add
files_metadata
option to repo_info by @SBrandeis in #951 - Add list_spaces to HfApi by @cakiki in #889
Miscellaneous helpers (advanced)
Filter which files to upload in upload_folder
Non-existence of files in a repo is now cached
Progress bars can be globally disabled via the HF_HUB_DISABLE_PROGRESS_BARS
env variable or using disable_progress_bars
/enable_progress_bars
helpers.
Use try_to_load_from_cache
to check if a file is locally cached
Documentation updates
- [Doc] Update "Download files from the Hub" doc by @julien-c in #948
- Docs: Fix some missing images and broken links by @NimaBoscarino in #936
- Replace
upload_file
withupload_folder
inupload_folder
docstring by @mariosasko in #927 - Clarify upload docs by @stevhliu in #944
Bugfixes & small improvements
- Handle redirections in
hf_hub_download
for a renamed repo by @Wauplin in #983 - PR Make
path_in_repo
optional inupload folder
by @Wauplin in #988 - Use a finer exception when local_files_only=True and a file is missing in cache by @Wauplin in #985
- use fixes JSONDecodeError by @Wauplin in #974
- 🐛 Fix PR creation for a repo the user does not own by @SBrandeis in #922
- login: tiny messaging tweak by @julien-c in #964
- Display endpoint URL in whoami command by @juliensimon in #895
- Small orphaned tweaks from #947 by @julien-c in #958
- FIX LFS track fix for Hub Mixin by @merveenoyan in #919
- 🐛 fix multilinguality test and example by @nateraw in #941
- Fix custom handling of refined HTTPError by @osanseviero in #924
- Followup to #901: Tweak
repocard_types.py
by @julien-c in #931 - [Keras Mixin] - Flattening out nested configurations for better table parsing. by @ariG23498 in #914
- [Keras Mixin] Rendering the Hyperparameter table vertically by @ariG23498 in #917
Internal
- Disable codecov + configure pytest FutureWarnings by @Wauplin in #976
- Enable coverage in CI by @Wauplin in #992
- Enable flake8 on W605 by @Wauplin in #975
- Enable
flake8-bugbear
+ adapt existing codebase by @Wauplin in #967 - Test that TensorFlow is not imported on startup by @lhoestq in #904
- Pin black to 22.3.0 to benefit from a stable --preview flag by @LysandreJik in #934
- Update dev version by @gante in #921
v0.8.1: lazy loading, git-aware cache file layout, new create_commit
Git-aware cache file layout
v0.8.1 introduces a new way of caching files from the Hugging Face Hub, to two methods: snapshot_download
and hf_hub_download
.
The new approach is extensively documented in the Documenting files guide and we recommend checking it out to get a better understanding of how caching works.
New create_commit
API
A new create_commit
API allows users to upload and delete several files at once using HTTP-based methods. You can read more about it in this guide. The following convenience methods were also introduced:
upload_folder
: Allows uploading a local directory to a repo.delete_file
allows deleting a single file from a repo.
upload_file
now uses create_commit
under the hood.
create_commit
also allows creating pull requests with a create_pr=True
flag.
None of the methods rely on Git locally.
- New
create_commit
API by @SBrandeis in #888
Lazy loading
All modules will now be lazy-loaded. This should drastically reduce the time it takes to import huggingface_hub
as it will no longer load all soft dependencies.
- ENH lazy load modules in the root init by @adrinjalali in #874
Improvements and bugfixes
- Add request ID to all requests by @LysandreJik in #909
- Remove deprecations by @LysandreJik in #910
- FIX Avoid creating repository when it exists on remote by @merveenoyan in #900
- 🏗 Use
hub-ci
for tests by @SBrandeis in #898 - Refine 404 errors by @LysandreJik in #878
- Fix typo by @lsb in #902
- FIX
metadata_update
: work on a copy of the upstream file, to not mess up the cache by @julien-c in #891 - ENH Removed history writing in Keras model card by @merveenoyan in #876
- CI enable codecov by @adrinjalali in #893
- MNT deprecate imports from snapshot_download by @adrinjalali in #880
- Pushback deprecation for v0.7 release by @LysandreJik in #882
- FIX make import machinary private by @adrinjalali in #879
- ENH Keras Use table instead of dictionary for hyperparameters in model card by @merveenoyan in #877
- Invert deprecation for create_repo in #912
- Constant was accidentally removed during deprecation transition in #913
v0.7.0: Repocard metadata
Repocard metadata
This PR adds a metadata_update function that allows the user to update the metadata in a repository on the hub. The function accepts a dict with metadata (following the same pattern as the YAML in the README) and behaves as follows for all top level fields except model-index.
Examples:
Starting from
existing_results = [{
'dataset': {'name': 'IMDb', 'type': 'imdb'},
'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
'task': {'name': 'Text Classification', 'type': 'text-classification'}
}]
1. Overwrite existing metric value in existing result
new_results = deepcopy(existing_results)
new_results[0]["metrics"][0]["value"] = 0.999
_update_metadata_model_index(existing_results, new_results, overwrite=True)
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.999}],
'task': {'name': 'Text Classification', 'type': 'text-classification'}}]
2. Add new metric to existing result
new_results = deepcopy(existing_results)
new_results[0]["metrics"][0]["name"] = "Recall"
new_results[0]["metrics"][0]["type"] = "recall"
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995},
{'name': 'Recall', 'type': 'recall', 'value': 0.995}],
'task': {'name': 'Text Classification', 'type': 'text-classification'}}]
3. Add new result
new_results = deepcopy(existing_results)
new_results[0]["dataset"] = {'name': 'IMDb-2', 'type': 'imdb_2'}
[{'dataset': {'name': 'IMDb', 'type': 'imdb'},
'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
'task': {'name': 'Text Classification', 'type': 'text-classification'}},
{'dataset': ({'name': 'IMDb-2', 'type': 'imdb_2'},),
'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.995}],
'task': {'name': 'Text Classification', 'type': 'text-classification'}}]
Improvements and bug fixes
- Keras: Saving history in a JSON file by @merveenoyan in #861
- space after uri by @leondz in #866
v0.6.0: fastai support, binary file support, skip LFS files when pushing to the hub
Disclaimer: This release was initially released with advertised support for #844. It was not released in this release and will be in v0.7.
fastai support
v0.6.0 introduces downstream (download) and upstream (upload) support for the fastai libraries. It supports fastai versions above 2.4.
The integration is detailed in the following blog.
- Add fastai upstream and downstream capacities for fastai>=2.4 and fastcore>=1.3.27 versions by @omarespejel in #678
Automatic binary file tracking in Repository
Binary files are now rejected by default by the Hub. v0.6.0 introduces automatic binary file tracking through the auto_lfs_track
argument of the Repository.git_add
method. It also introduces the Repository.auto_track_binary_files
method which can be used independently of other methods.
- ENH Auto track binary files in Repository by @LysandreJik in #828
skip_lfs_file
is now added to mixins
The parameter skip_lfs_files
is now added to the different mixins. This will enable pushing files to the hub without first downloading the files above 10MB. This should drammatically reduce the time needed when updating a modelcard, a configuration file, and others.
Keras support improvement
The support for Keras model is greatly improved through several additions:
- The
save_pretrained_keras
method now accepts a list of tags that will automatically be added to the repository. - Download statistics are now available on Keras models
- Introducing list of tags to Keras model card by @merveenoyan in #806
- Enable keras download stats by @merveenoyan in #860
Bugfixes and improvements
- FIX don't raise if name/organizaiton are passed postionally by @adrinjalali in #822
- ENH Use provided token from HUGGING_FACE_HUB_TOKEN env variable if available by @FrancescoSaverioZuppichini in #794
- tests(hf_api): remove infectionTypes field by @McPatate in #834
- Remove docs, tasks and inference API from huggingface_hub by @osanseviero in #833
- FEAT Uniformize
hf_api
a bit and add support for Spaces by @julien-c in #792 - Add a bug report template by @osanseviero in #832
- clean up formatting by @stevhliu in #839
- Release guide by @LysandreJik in #820
- Fix keras test by @osanseviero in #855
- DOC Add quick start guide by @stevhliu in #850
- MNT refactor: subprocess.run -> run_subprocess by @LysandreJik in #352
- MNT enable preview on black by @adrinjalali in #849
- Update how to guides by @stevhliu in #840
- Update contribution guide for merging PRs by @stevhliu in #856
- DOC Update landing page by @stevhliu in #854
- space after uri by @leondz in #866
v0.5.1: Patch release
This is a patch release fixing a breaking backward compatibility issue.
Linked PR: #822