Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

add driver and simplecache example #20

Merged
merged 22 commits into from
Feb 17, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 126 additions & 1 deletion docs/source/tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -728,7 +728,7 @@
"metadata": {},
"source": [
"The `to_dask()` reads only metadata needed to construct an ``xarray.Dataset``. The actual data are streamed over the network when computation routines are invoked on the dataset. \n",
"By default, intake-thredds uses ``chunks={}`` to load the dataset with dask using a single chunk for all arrays. You can use a different chunking scheme by prividing a custom value of chunks before calling `.to_dask()`:"
"By default, `intake-thredds` uses ``chunks={}`` to load the dataset with dask using a single chunk for all arrays. You can use a different chunking scheme by prividing a custom value of chunks before calling `.to_dask()`:"
]
},
{
Expand Down Expand Up @@ -2009,6 +2009,131 @@
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Caching\n",
"\n",
"Under the hood `intake-thredds` uses the `driver='opendap'` from `intake-xarray` by default. You can also choose\n",
"`driver='netcdf'`, which in combination with `fsspec` caches files by appending `simplecache::` to the url,\n",
"see https://filesystem-spec.readthedocs.io/en/latest/features.html#remote-write-caching."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"sources:\n",
" thredds_merged:\n",
" args:\n",
" driver: netcdf\n",
" path:\n",
" - Datasets\n",
" - ncep.reanalysis.dailyavgs\n",
" - surface\n",
" - air.sig995.194*.nc\n",
" url: simplecache::https://psl.noaa.gov/thredds/catalog.xml\n",
" description: ''\n",
" driver: intake_thredds.source.THREDDSMergedSource\n",
" metadata:\n",
" fsspec_pre_url: 'simplecache::'\n",
"\n"
]
}
],
"source": [
"import os\n",
"\n",
"import fsspec\n",
"\n",
"# specify caching location, where to store files to with their original names\n",
"fsspec.config.conf['simplecache'] = {'cache_storage': 'my_caching_folder', 'same_names': True}\n",
"\n",
"cat_url = 'https://psl.noaa.gov/thredds/catalog.xml'\n",
"source = intake.open_thredds_merged(\n",
" f'simplecache::{cat_url}',\n",
" path=['Datasets', 'ncep.reanalysis.dailyavgs', 'surface', 'air.sig995.194*.nc'],\n",
" driver='netcdf', # specify netcdf driver to open HTTPServer\n",
")\n",
"print(source)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Dataset(s): 100%|████████████████████████████████| 2/2 [00:10<00:00, 5.44s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 875 ms, sys: 186 ms, total: 1.06 s\n",
"Wall time: 19.1 s\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"%time ds = source.to_dask()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "biblical-diana",
"metadata": {},
"outputs": [],
"source": [
"assert os.path.exists('my_caching_folder/air.sig995.1949.nc')"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 10 µs, sys: 1e+03 ns, total: 11 µs\n",
"Wall time: 12.9 µs\n"
]
}
],
"source": [
"# after caching very fast\n",
"%time ds = source.to_dask()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"\n",
"shutil.rmtree('my_caching_folder')"
],
"cell_type": "code",
"execution_count": null,
"id": "biblical-diana",
Expand Down