18 Jul 08:06

93b136f

v0.0.14: LFS Auto tracking, `dataset_info` and `list_datasets`, documentation

Datasets

Datasets repositories get better support, by first enabling full usage of the Repository class for datasets repositories:

from huggingface_hub import Repository

repo = Repository("local_directory", clone_from="<user>/<model_id>", repo_type="dataset")

Datasets can now be retrieved from the Python runtime using the list_datasets method from the HfApi class:

from huggingface_hub import HfApi

api = HfApi()
datasets = api.list_datasets()

len(datasets)
# 1048 publicly available dataset repositories at the time of writing

Information can be retrieved on specific datasets using the dataset_info method from the HfApi class:

from huggingface_hub import HfApi

api = HfApi()
api.dataset_info("squad")
# DatasetInfo: {
# 	id: squad
#	lastModified: 2021-07-07T13:18:53.595Z
#	tags: ['pretty_name:SQuAD', 'annotations_creators:crowdsourced', 'language_creators:crowdsourced', 'language_creators:found', 
# [...]

Add dataset_info and list_datasets #164 (@lhoestq)
Enable dataset repositories #151 (@LysandreJik)

Inference API wrapper client

Version v0.0.14 introduces a wrapper client for the Inference API. No need to use custom-made requests anymore. See below for an example.

from huggingface_hub import InferenceApi

api = InferenceApi("bert-base-uncased")
api(inputs="The [MASK] is great")
# [
#    {'sequence': 'the music is great', 'score': 0.03599703311920166, 'token': 2189, 'token_str': 'music'}, 
#    {'sequence': 'the price is great', 'score': 0.02146693877875805, 'token': 3976, 'token_str': 'price'}, 
#    {'sequence': 'the money is great', 'score': 0.01866752654314041, 'token': 2769, 'token_str': 'money'}, 
#    {'sequence': 'the fun is great', 'score': 0.01654735580086708, 'token': 4569, 'token_str': 'fun'}, 
#    {'sequence': 'the effect is great', 'score': 0.015102624893188477, 'token': 3466, 'token_str': 'effect'}
# ]

Inference API wrapper client #65 (@osanseviero)

Auto-track with LFS

Version v0.0.14 introduces an auto-tracking mechanism with git-lfs for large files. Files that are larger than 10MB can be automatically tracked by using the auto_track_large_files method:

from huggingface_hub import Repository

repo = Repository("local_directory", clone_from="<user>/<model_id>")

# save large files in `local_directory`
repo.git_add()
repo.auto_track_large_files()
repo.git_commit("Add large files")
repo.git_push()
# No push rejected error anymore!

It is automatically used when leveraging the commit context manager:

from huggingface_hub import Repository

repo = Repository("local_directory", clone_from="<user>/<model_id>")
with repo.commit("Add large files"):
    # add large files

# No push rejected error anymore!

Auto track with LFS #177 (@LysandreJik)

Documentation

Update docs structure #145 (@Pierrci)
Update links to docs #147 (@LysandreJik)
Add new repo guide #153 (@osanseviero)
Add documentation for endpoints #155 (@osanseviero)
Document hf.co webhook publicly #156 (@julien-c)
docs: ✏️ mention the Training metrics tab #193 (@severo)
doc for Spaces #189 (@julien-c)

Breaking changes

Reminder: the huggingface_hub library follows semantic versioning and is undergoing active development. While the first major version is not out (v1.0.0), you should expect breaking changes and we strongly recommend pinning the library to a specific version.

Two breaking changes are introduced with version v0.0.14.

The `whoami` return changes from a tuple to a dictionary

Allow obtaining Inference API tokens with whoami #157 (@osanseviero)

The whoami method changes its returned value from a tuple of (<user>, [<organisations>]) to a dictionary containing a lot more information:

In versions v0.0.13 and below, here was the behavior of the whoami method from the HfApi class:

from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# ('<user>', ['<org_0>', '<org_1>'])

In version v0.0.14, this is updated to the following:

from huggingface_hub import HfFolder, HfApi
api = HfApi()
api.whoami(HfFolder.get_token())
# {
#     'type': str, 
#     'name': str, 
#     'fullname': str, 
#     'email': str,
#     'emailVerified': bool, 
#     'apiToken': str,
#     `plan': str, 
#     'avatarUrl': str,
#     'orgs': List[str]
# }

The `Repository`'s `use_auth_token` initialization parameter now defaults to `True`.

The use_auth_token initialization parameter of the Repository class now defaults to True. The behavior is unchanged if users are not logged in, at which point Repository remains agnostic to the huggingface_hub.

Set use_auth_token to True by default #204 (@LysandreJik)

Improvements and bugfixes

Add sklearn code snippet #133 (@osanseviero)
Allow passing only model ID to clone when authenticated #150 (@LysandreJik)
More robust endpoint with toggled staging endpoint #148 (@LysandreJik)
Add config to list_models #152 (@osanseviero)
Fix audio-to-audio widget and add icon #142 (@osanseviero)
Upgrade spaCy to api 0.0.12 and remove allowlist #161 (@osanseviero)
docs: fix webhook response format #162 (@severo)
Update link in README.md #163 (@nateraw)
Revert "docs: fix webhook response format (#162)" #165 (@severo)
Add Keras docker image #117 (@osanseviero)
Allow multiple models when testing a pipeline #124 (@osanseviero)
scikit rebased #170 (@Narsil)
Upgrading community frameworks to audio-to-audio. #94 (@Narsil)
Add sagemaker docs #173 (@philschmid)
Add Structured Data Classification as task #172 (@osanseviero)
Fixing keras outputs (widgets was ignoring because of type mismatch, now testing for it) #176 (@Narsil)
Updating spacy. #179 (@Narsil)
Create initial superb docker image structure #181 (@osanseviero)
Upgrading asteroid image. #175 (@Narsil)
Removing tests on huggingface_hub for unrelated changes in api-inference-community #180 (@Narsil)
Fixing audio-to-audio validation. #184 (@Narsil)
rmdir api-inference-community/src/sentence-transformers #188 (@Pierrci)
Allow generic inference for ASR for superb #185 (@osanseviero)
Add timestamp to snapshot download tests #201 (@LysandreJik)
No need for token to understand HF urls #203 (@LysandreJik)
Remove --no_renames argument to list deleted files. #205 (@LysandreJik)

Assets 2

28 Jun 12:57

LysandreJik

v0.0.13

7f12f29

v0.0.13: Context Manager

Version 0.0.13 introduces a context manager to save files directly to the Hub. See below for some examples.

Example with a single file

from huggingface_hub import Repository

repo = Repository("text-files", clone_from="<user>/text-files", use_auth_token=True)

with repo.commit("My first file."):
    with open("file.txt", "w+") as f:
        f.write(json.dumps({"key": "value"}))

Example with a `torch.save` statement:

import torch
from huggingface_hub import Repository

model = torch.nn.Transformer()

repo = Repository("torch-files", clone_from="<user>/torch-files", use_auth_token=True)

with repo.commit("Adding my cool model!"):
    torch.save(model.state_dict(), "model.pt")

Example with a Flax/JAX seralization statement

from flax import serialization
from jax import random
from flax import linen as nn
from huggingface_hub import Repository

model = nn.Dense(features=5)

key1, key2 = random.split(random.PRNGKey(0))
x = random.normal(key1, (10,))
params = model.init(key2, x)

bytes_output = serialization.to_bytes(params)

repo = Repository("flax-model", clone_from="<user>/flax-model", use_auth_token=True)

with repo.commit("Adding my cool Flax model!"):
    with open("flax_model.msgpack", "wb") as f:
        f.write(bytes_output)

Assets 2

23 Jun 16:24

LysandreJik

v0.0.12

31e8d5e

Patch release: Repository clones

Patches an issue when cloning a repository twice.

Assets 2

23 Jun 11:01

LysandreJik

v0.0.11

4422073

v0.0.11: Improved documentation, `hf_hub_download` and `Repository` power-up

Improved documentation

The huggingface_hub documentation is now available on hf.co/docs! Additionally, a new step-by-step guide to adding libraries is available.

New documentation for 🤗 Hub #71 (@osanseviero)
Step by step guide on adding Model Hub support to libraries #86 (@LysandreJik)

New method: `hf_hub_download`

A new method is introduced: hf_hub_download. It is the equivalent of doing cached_download(hf_hub_url()), in a single method.

HF Hub download #137 (@LysandreJik)

`Repository` power-up

The Repository class is updated to behave more similarly to git. It is now impossible to clone a repository in a folder that already contains files.

The PyTorch Mixin contributed by @vasudevgupta7 is slightly updated to have the push_to_hub method manage a repository as one would from the command line.

Repository power-up #132 (@LysandreJik)

Improvement & Fixes

Adding audio-to-audio task. #93 (@Narsil)
When pipelines fail to load in framework code, for whatever reason #96 (@Narsil)
Solve rmtree issue on windows #105 (@SBrandeis)
Add identical_ok option to HfApi.upload_file method #102 (@SBrandeis)
Solve compatibility issues when calling subprocess.run #104 (@SBrandeis)
Open source Inference widgets + optimize for community contributions #87 (@julien-c)
model tags can be undefined #107 (@Pierrci)
Doc tweaks #109 (@julien-c)
[huggingface_hub] Support for spaces #108 (@julien-c)
speechbrain library tag + code snippet #73 (@osanseviero)
Allow batching for feature-extraction #106 (@osanseviero)
adding audio-to-audio widget. #95 (@Narsil)
Add image to text (for image captioning) #114 (@osanseviero)
Add formatting and upgrade Sentence Transformers api version for better error messages #119 (@osanseviero)
Change videos in docs so they are played directly in our site #120 (@osanseviero)
Fix inference API GitHub actions #125 (@osanseviero)
Fixing sentence-transformers CACHE value for docker + functools (docker needs Py3.8) #123 (@Narsil)
Load errors with flair should now be generating proper API errors. #121 (@Narsil)
Simplify manage to autodetect task+framework if possible. #122 (@Narsil)
Change sentence transformers source to original repo #128 (@osanseviero)
Allow Python versions with letters in the minor version suffix #82 (@ulf1)
Update upload_file docs #136 (@LysandreJik)
Reformat repo README #130 (@osanseviero)
Add config to model info #135 (@osanseviero)
Add input validation for structured-data-classification #97 (@osanseviero)

Assets 2

08 Jun 13:43

LysandreJik

v0.0.10

59ea999

v0.0.10: Merging `huggingface_hub` with `api-inference-community` and hub interfaces

v0.0.10 Signs the merging of three components of the HuggingFace stack: the huggingface_hub repository is now the central platform to contribute new libraries to be supported on the hub.

It regroups three previously separated components:

The huggingface_hub Python library, as the Python library to download, upload, and retrieve information from the hub.
The api-inference-community, as the platform where libraries wishing for hub support may be added.
The interfaces, as the definition for pipeline types as well as default widget inputs and definitions/UI elements for third-party libraries.

Future efforts will be focused on further easing contributing third-party libraries to the Hugging Face Hub

Improvement & Fixes

Add typing extensions to conda yaml file #49 (@LysandreJik)
Alignment on modelcard metadata specification #39 (@LysandreJik)
Bring interfaces from widgets-server #50 (@julien-c)
Sentence similarity default widget and pipeline type #52 (@osanseviero)
[interfaces] Expose configuration options for external libraries #51 (@julien-c)
Adding api-inference-community to huggingface_hub. #48 (@Narsil)
Add TensorFlowTTS as library + code snippet #55 (@osanseviero)
Add protobuf as a dependency to handle tokenizers that require it: #58 (@Narsil)
Update validation for NLP tasks #59 (@osanseviero)
spaCy code snippet and language tag #57 (@osanseviero)
SpaCy fixes #60 (@osanseviero)
Allow changing repo visibility programmatically #61 (@osanseviero)
Add Adapter Transformers snippet #62 (@osanseviero)
Change order in spaCy snippet #66 (@osanseviero)
Add validation to check all rows in table question answering have same length #67 (@osanseviero)
added question-answering part for Bengali language #68 (@sagorbrur)
Add spaCy to inference API #63 (@osanseviero)
AllenNLP library tag + code snippet #72 (@osanseviero)
Fix AllenNLP QA example #80 (@epwalsh)
do not crash even if this config isn't set #81 (@julien-c)
Mark model config as optional #83 (@Pierrci)
Add repr() to ModelFile and RepoObj #75 (@lewtun)
Refactor create_repo #84 (@SBrandeis)

Assets 2

20 May 15:19

LysandreJik

v0.0.9

2e81c2f

v0.0.9: HTTP File uploads, multiple filter model selection

v0.0.9: HTTP file uploads, multiple filter model selection

Support for large file uploads

Implementation of an endpoint to programmatically upload (large) files to any repo on the hub, without the need for git, using HTTP POST requests.

[API] Support for the file upload endpoint #42 (@SBrandeis)

The `HfApi.model_list` method now allows multiple filters

Models may now be filtered using several filters:

                Example usage:

                    >>> from huggingface_hub import HfApi
                    >>> api = HfApi()

                    >>> # List all models
                    >>> api.list_models()

                    >>> # List only the text classification models
                    >>> api.list_models(filter="text-classification")

                    >>> # List only the russian models compatible with pytorch
                    >>> api.list_models(filter=("ru", "pytorch"))

                    >>> # List only the models trained on the "common_voice" dataset
                    >>> api.list_models(filter="dataset:common_voice")

                    >>> # List only the models from the AllenNLP library
                    >>> api.list_models(filter="allennlp")

Document the filter argument #41 (@LysandreJik)

`ModelInfo` now has a readable representation

Improvement of the ModelInfo class so that it displays information about the object.

Include a readable repr for ModelInfo #32 (@muellerzr)

Improvements and bugfixes

Fix conda by specifying python version + add tests to main branch #28 (@LysandreJik)
Improve Mixin #34 (@LysandreJik)
Enable library_name and library_version in snapshot_download #38 (@LysandreJik)
[Windows support] Very long filenames #40 (@LysandreJik)
Make error message more verbose when creating a repo #44 (@osanseviero)
Open-source /docs #46 (@julien-c)

Assets 2

07 Apr 23:57

LysandreJik

v0.0.8

54d4bf9

v0.0.8: Model Info, Snapshot download

Addition of the HfApi.model_info method to retrieve information about a repo given a revision.
The accompanying snapshot_download utility to download to cache all files stored in that repo at that given revision.

Example usage of HfApi.model_info:

from huggingface_hub import HfApi

hf_api = HfApi()
model_info = hf_api.model_info("lysandre/dummy-hf-hub")

print("Model ID:", model_info.modelId)

for file in model_info.siblings:
    print("file:", file.rfilename)

outputs:

Model ID: lysandre/dummy-hf-hub
file: .gitattributes
file: README.md

Example usage of snapshot_download:

from huggingface_hub import snapshot_download
import os

repo_path = snapshot_download("lysandre/dummy-hf-hub")
print(os.listdir(repo_path))

outputs:

['.gitattributes', 'README.md']

Assets 2

18 Mar 17:54

julien-c

v0.0.7

e0e5ba1

v0.0.7: Networking improvements + PyTorch mixin

Networking improvements by @Pierrci and @lhoestq (#21 and #22)
Adding mixin class for ease saving, uploading, downloading a PyTorch model. See PR #11 by @vasudevgupta7

Example usage:

from huggingface_hub import ModelHubMixin

class MyModel(nn.Module, ModelHubMixin):
   def __init__(self, **kwargs):
      super().__init__()
      self.config = kwargs.pop("config", None)
      self.layer = ...
   def forward(self, ...):
      return ...

model = MyModel()

# saving model to local directory & pushing to hub
model.save_pretrained("mymodel", push_to_hub=True, config={"act": "gelu"})

# initiatizing model & loading it from trained-weights
model = MyModel.from_pretrained("username/mymodel@main")

Thanks a ton for your contributions ♥️

Assets 2

02 Mar 15:03

julien-c

v0.0.6

446ea44

v0.0.6

[Repository] Strip basic auth info

Assets 2

02 Mar 14:41

julien-c

v0.0.5

a0fb3dd

Repository class + other tweaks

v0.0.5

v0.0.5

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.14: LFS Auto tracking, `dataset_info` and `list_datasets`, documentation

Datasets

Inference API wrapper client

Auto-track with LFS

Documentation

Breaking changes

The `whoami` return changes from a tuple to a dictionary

The `Repository`'s `use_auth_token` initialization parameter now defaults to `True`.

Improvements and bugfixes

v0.0.13: Context Manager

Example with a single file

Example with a `torch.save` statement:

Example with a Flax/JAX seralization statement

v0.0.11: Improved documentation, `hf_hub_download` and `Repository` power-up

Improved documentation

New method: `hf_hub_download`

`Repository` power-up

Improvement & Fixes

v0.0.10: Merging `huggingface_hub` with `api-inference-community` and hub interfaces

Improvement & Fixes

v0.0.9: HTTP file uploads, multiple filter model selection

Support for large file uploads

The `HfApi.model_list` method now allows multiple filters

`ModelInfo` now has a readable representation

Improvements and bugfixes

Releases: huggingface/huggingface_hub

v0.0.14: LFS Auto tracking, `dataset_info` and `list_datasets`, documentation

v0.0.14: LFS Auto tracking, dataset_info and list_datasets, documentation

Datasets

Inference API wrapper client

Auto-track with LFS

Documentation

Breaking changes

The whoami return changes from a tuple to a dictionary

The Repository's use_auth_token initialization parameter now defaults to True.

Improvements and bugfixes

v0.0.13: Context Manager

v0.0.13: Context Manager

Example with a single file

Example with a torch.save statement:

Example with a Flax/JAX seralization statement

Patch release: Repository clones

v0.0.11: Improved documentation, `hf_hub_download` and `Repository` power-up

v0.0.11: Improved documentation, hf_hub_download and Repository power-up

Improved documentation

New method: hf_hub_download

Repository power-up

Improvement & Fixes

v0.0.10: Merging `huggingface_hub` with `api-inference-community` and hub interfaces

v0.0.10: Merging huggingface_hub with api-inference-community and hub interfaces

Improvement & Fixes