huggingface · julien-c · Jul 26, 2022 · Jul 20, 2022 · Jul 20, 2022 · Jul 20, 2022
diff --git a/README.md b/README.md
@@ -6,7 +6,7 @@
 
 The `huggingface_hub` is a client library to interact with the Hugging Face Hub. The Hugging Face Hub is a platform with over 35K models, 4K datasets, and 2K demos in which people can easily collaborate in their ML workflows. The Hub works as a central place where anyone can share, explore, discover, and experiment with open-source Machine Learning.
 
-With `huggingface_hub`, you can easily download and upload models, extract useful information from the Hub, and do much more. Some example use cases:
+With `huggingface_hub`, you can easily download and upload models, datasets, and Spaces. You can extract useful information from the Hub, and do much more. Some example use cases:
 * Downloading and caching files from a Hub repository.
 * Creating repositories and uploading an updated model every few epochs.
 * Extract metadata from all models that match certain criteria (e.g. models for `text-classification`).
@@ -22,7 +22,7 @@ We're partnering with cool open source ML libraries to provide free model hostin
 
 The advantages are:
 
-- Free model hosting for libraries and their users.
+- Free model or dataset hosting for libraries and their users.
 - Built-in file versioning, even with very large files, thanks to a git-based approach.
 - Hosted inference API for all models publicly available.
 - In-browser widgets to play with the uploaded models.

diff --git a/docs/source/how-to-downstream.mdx b/docs/source/how-to-downstream.mdx
@@ -5,19 +5,21 @@ stored on the Hub. You can use these functions independently or integrate them i
 own library, making it more convenient for your users to interact with the Hub. This
 guide will show you how to:
 
-* Specify a file to download from the Hub.
-* Download and cache a file on your disk.
+* Download and store a file from the Hub.
 * Download all the files in a repository.
 
-## Choose a file to download
+## Download and store a file from the Hub
 
-Use the `filename` parameter in the [`hf_hub_url`] function to retrieve the URL of a
-specific file to download:
+The [`hf_hub_download`] function is the main function for downloading files from the Hub.
+
+It downloads the remote file, stores it on disk (in a version-aware way), and returns its local file path.
+
+Use the `repo_id` and `filename` parameters to specify which file to download:
 
 ```python
->>> from huggingface_hub import hf_hub_url
->>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json")
-'https://huggingface.co/lysandre/arxiv-nlp/resolve/main/config.json'
+>>> from huggingface_hub import hf_hub_download
+>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")
+'/root/.cache/huggingface/hub/models--lysandre--arxiv-nlp/snapshots/894a9adde21d9a3e3843e6d5aeaaf01875c7fade/config.json'
 ```
 
 <div class="flex justify-center">
@@ -30,58 +32,36 @@ branch name, a tag, or a commit hash. When using the commit hash, it must be the
 full-length hash instead of a 7-character commit hash:
 
 ```python
->>> hf_hub_url(repo_id="lysandre/arxiv-nlp", 
-...            filename="config.json", 
-...            revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a",
+>>> hf_hub_download(
+...    repo_id="lysandre/arxiv-nlp",
+...    filename="config.json",
+...    revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a",
 ... )
-'https://huggingface.co/lysandre/arxiv-nlp/resolve/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json'
+'/root/.cache/huggingface/hub/models--lysandre--arxiv-nlp/snapshots/877b84a8f93f2d619faa2a6e514a32beef88ab0a/config.json'
 ```
 
 To specify a file revision with the branch name:
 
 ```python
->>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
+>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="main")
 ```
 
 To specify a file revision with a tag identifier. For example, if you want `v1.0` of the
 `config.json` file:
 
 ```python
->>> hf_hub_url(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
+>>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
 ```
 
-## Download and store a file
+## Construct a download URL
 
-[`cached_download`] is used to download and cache a file on your local disk. Once a file
-is stored in your cache, you don't have to redownload it the next time you use it.
-[`cached_download`] is a hands-free solution for staying up to date with new file
-versions. When a downloaded file is updated in the remote repository,
-[`cached_download`] will automatically download and store it.
-
-Begin by retrieving the file URL with [`hf_hub_url`], and then pass the specified URL to
-[`cached_download`] to download the file:
-
-```python
->>> from huggingface_hub import hf_hub_url, cached_download
->>> config_file_url = hf_hub_url("lysandre/arxiv-nlp", filename="config.json")
->>> cached_download(config_file_url)
-'/home/lysandre/.cache/huggingface/hub/bc0e8cc2f8271b322304e8bb84b3b7580701d53a335ab2d75da19c249e2eeebb.066dae6fdb1e2b8cce60c35cc0f78ed1451d9b341c78de19f3ad469d10a8cbb1'
-```
-
-[`hf_hub_url`] and [`cached_download`] work hand-in-hand to download a file. This is
-such a standard workflow that [`hf_hub_download`] is a wrapper that calls both of these
-functions.
-
-```python
->>> from huggingface_hub import hf_hub_download
->>> hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")
-```
+In case you want to construct the URL used to download a file from a repo, you can use [`hf_hub_url`] which returns a URL.
+Note that it is used internally by [`hf_hub_download`].
 
 ## Download an entire repository
 
 [`snapshot_download`] downloads an entire repository at a given revision. Like
-[`cached_download`], all downloaded files are cached on your local disk. However, even
-if only a single file is updated, the entire repository will be redownloaded.
+[`hf_hub_download`], all downloaded files are cached on your local disk.
 
 Download a whole repository as shown in the following:
 
@@ -100,7 +80,7 @@ repository revision, use the `revision` parameter:
 ```
 
 In general, it is usually better to download files with [`hf_hub_download`] - if you
-already know the file name - to avoid redownloading an entire repository.
+already know the file names you need.
 [`snapshot_download`] is helpful when you are unaware of which files to download.
 
 However, you don't always want to download the contents of an entire repository with
@@ -130,7 +110,3 @@ following example ignores the `.msgpack` and `.h5` file extensions:
 
 Passing a regex can be especially useful when repositories contain files that are never
 expected to be downloaded by [`snapshot_download`].
-
-Note that passing `allow_regex` or `ignore_regex` does **not** prevent
-[`snapshot_download`] from redownloading the entire model repository if an ignored file
-is changed.
diff --git a/docs/source/package_reference/file_download.mdx b/docs/source/package_reference/file_download.mdx
@@ -4,8 +4,6 @@
 
 [[autodoc]] huggingface_hub.snapshot_download
 
-[[autodoc]] huggingface_hub.cached_download
-
 [[autodoc]] huggingface_hub.hf_hub_url
 
 ## Caching

diff --git a/src/huggingface_hub/README.md b/src/huggingface_hub/README.md
@@ -2,57 +2,16 @@
 
 ## Download files from the Hub
 
-Three utility functions are provided to dowload files from the Hub. One
-advantage of using them is that files are cached locally, so you won't have to
+The `hf_hub_download()` function is the main function to download files from the Hub. One
+advantage of using it is that files are cached locally, so you won't have to
 download the files multiple times. If there are changes in the repository, the
 files will be automatically downloaded again.
 
-### `hf_hub_url`
-
-`hf_hub_url()` returns the url we'll use to download the actual files:
-`https://huggingface.co/julien-c/EsperBERTo-small/resolve/main/pytorch_model.bin`
-
-Parameters:
-- a `repo_id` (a user or organization name and a repo name seperated by a `/`, like `julien-c/EsperBERTo-small`)
-- a `filename` (like `pytorch_model.bin`)
-- an optional `subfolder`, corresponding to a folder inside the model repo
-- an optional `repo_type`, such as `dataset` or `space`
-- an optional Git revision id (can be a branch name, a tag, or a commit hash)
-
-If you check out this URL's headers with a `HEAD` http request (which you can do
-from the command line with `curl -I`) for a few different files, you'll see
-that:
-- small files are returned directly
-- large files (i.e. the ones stored through
-  [git-lfs](https://git-lfs.github.com/)) are returned via a redirect to a
-  Cloudfront URL. Cloudfront is a Content Delivery Network, or CDN, that ensures
-  that downloads are as fast as possible from anywhere on the globe.
-
-### `cached_download`
-
-`cached_download()` takes the following parameters, downloads the remote file,
-stores it to disk (in a versioning-aware way) and returns its local file path.
-
-Parameters:
-- a remote `url`
-- a `cache_dir` which you can specify if you want to control where on disk the
-  files are cached.
-
-A common use case is to download the files from a download url
-
-```python
-from huggingface_hub import hf_hub_url, cached_download
-config_file_url = hf_hub_url("lysandre/arxiv-nlp", filename="config.json")
-cached_download(config_file_url)
-```
-
-Check out the [source code](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/file_download.py) and search for `cached_download` for all possible params (we'll create a real doc page
-in the future).
 
 ### `hf_hub_download`
 
-Since the use case of combining `hf_hub_url()` and `cached_download()` is very
-common, we also provide a wrapper that calls both functions.
+The function takes the following parameters, downloads the remote file,
+stores it to disk (in a version-aware way) and returns its local file path.
 
 Parameters:
 - a `repo_id` (a user or organization name and a repo name, separated by `/`, like `julien-c/EsperBERTo-small`)
@@ -68,7 +27,7 @@ hf_hub_download("lysandre/arxiv-nlp", filename="config.json")
 
 ### `snapshot_download`
 
-Using `hf_hub_download()` works well when you have a fixed repository structure;
+Using `hf_hub_download()` works well when you know which files you want to download;
 for example a model file alongside a configuration file, both with static names.
 There are cases in which you will prefer to download all the files of the remote
 repository at a specified revision. That's what `snapshot_download()` does. It
@@ -81,6 +40,28 @@ Parameters:
 - a `cache_dir` which you can specify if you want to control where on disk the
   files are cached
 
+### `hf_hub_url`
+
+Internally, the library uses `hf_hub_url()` to return the URL to download the actual files:
+`https://huggingface.co/julien-c/EsperBERTo-small/resolve/main/pytorch_model.bin`
+
+
+Parameters:
+- a `repo_id` (a user or organization name and a repo name seperated by a `/`, like `julien-c/EsperBERTo-small`)
+- a `filename` (like `pytorch_model.bin`)
+- an optional `subfolder`, corresponding to a folder inside the model repo
+- an optional `repo_type`, such as `dataset` or `space`
+- an optional Git revision id (can be a branch name, a tag, or a commit hash)
+
+If you check out this URL's headers with a `HEAD` http request (which you can do
+from the command line with `curl -I`) for a few different files, you'll see
+that:
+- small files are returned directly
+- large files (i.e. the ones stored through
+  [git-lfs](https://git-lfs.github.com/)) are returned via a redirect to a
+  Cloudfront URL. Cloudfront is a Content Delivery Network, or CDN, that ensures
+  that downloads are as fast as possible from anywhere on the globe.
+
 <br>
 
 ## Publish files to the Hub
@@ -126,7 +107,10 @@ With the `HfApi` class there are methods to query models, datasets, and metrics
   - `list_datasets()`
   - `dataset_info()`
   - `get_dataset_tags()`
-
+- **Spaces**:
+  - `list_spaces()`
+  - `space_info()`
+
 These lightly wrap around the API Endpoints. Documentation for valid parameters and descriptions can be found [here](https://huggingface.co/docs/hub/endpoints).