Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add functionality for derived variables #379

Merged
merged 10 commits into from
Oct 15, 2021
Merged

Add functionality for derived variables #379

merged 10 commits into from
Oct 15, 2021

Conversation

andersy005
Copy link
Member

@andersy005 andersy005 commented Oct 14, 2021

Change Summary

  • Adds derived.py module. This module houses data classes used for derived variables
  • Adds derivedcat attribute on the main catalog object
  • Adapts the search() method to both the base/main and derived catalogs
  • Adapt to_dataset_dict() for derived variables

Related issue number

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable

@andersy005 andersy005 added this to the Winter 2021 Release milestone Oct 14, 2021
@andersy005 andersy005 added the enhancement Issues that are found to be a reasonable candidate feature additions label Oct 14, 2021
@andersy005 andersy005 marked this pull request as ready for review October 14, 2021 23:27
@andersy005
Copy link
Member Author

andersy005 commented Oct 14, 2021

This seems to be working quite well:

  • Create a local registry
In [1]: import intake

In [2]: import intake_esm

In [3]: registry = intake_esm.DerivedVariableRegistry()

In [4]: @registry.register(variable='FOO', dependent_variables=['FLNS', 'FLUT'])
   ...: def func(ds):
   ...:     ds['FOO'] = ds.FLNS + ds.FLUT
   ...:     return ds
   ...: 
   ...: @registry.register(variable='BAR', dependent_variables=['FLUT'])
   ...: def funcs(ds):
   ...:     ds['BAR'] = ds.FLUT * 1000
   ...:     return ds
   ...: 
  • Instantiate a catalog object
In [5]: cat = intake.open_esm_datastore("./tests/sample-collections/catalog-dict-records.json", registry=registry)

In [11]: cat.df
Out[11]: 
  component frequency experiment variable                                               path
0       atm     daily        20C     FLNS  s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS....
1       atm     daily        20C    FLNSC  s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNSC...
2       atm     daily        20C     FLUT  s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLUT....
3       atm     daily        20C     FSNS  s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FSNS....
4       atm     daily        20C    FSNSC  s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FSNSC...
  • For demo purposes, search for derived variables FOO and BAR only
In [6]: new_cat = cat.search(variable=['FOO', 'BAR'])

In [7]: new_cat
Out[7]: <aws-cesm1-le catalog with 1 dataset(s) from 2 asset(s)>
  • Load data into xarray
In [8]: ds = new_cat.to_dataset_dict(xarray_open_kwargs={'backend_kwargs': {'storage_options': {'anon': True}}})

--> The keys in the returned dictionary of datasets are constructed as follows:
        'component.experiment.frequency'
 |████████████████████████████████████████████████████████████████████████████████| 100.00% [1/1 00:00<00:00]
  • FOO and BAR are included in our datasets 🎉
In [9]: ds['atm.20C.daily']
Out[9]: 
<xarray.Dataset>
Dimensions:    (member_id: 40, time: 31390, lat: 192, lon: 288, nbnd: 2)
Coordinates:
  * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * member_id  (member_id) int64 1 2 3 4 5 6 7 8 ... 34 35 101 102 103 104 105
  * time       (time) object 1920-01-01 12:00:00 ... 2005-12-31 12:00:00
    time_bnds  (time, nbnd) object dask.array<chunksize=(15695, 2), meta=np.ndarray>
Dimensions without coordinates: nbnd
Data variables:
    FLNS       (member_id, time, lat, lon) float32 dask.array<chunksize=(1, 576, 192, 288), meta=np.ndarray>
    FLUT       (member_id, time, lat, lon) float32 dask.array<chunksize=(1, 576, 192, 288), meta=np.ndarray>
    FOO        (member_id, time, lat, lon) float32 dask.array<chunksize=(1, 576, 192, 288), meta=np.ndarray>
    BAR        (member_id, time, lat, lon) float32 dask.array<chunksize=(1, 576, 192, 288), meta=np.ndarray>
Attributes: (12/15)
    Conventions:                  CF-1.0
    NCO:                          4.4.2
    Version:                      $Name$
    important_note:               This data is part of the project 'Blind Eva...
    initial_file:                 b.e11.B20TRC5CNBDRD.f09_g16.001.cam.i.1920-...
    logname:                      mudryk
    ...                           ...
    title:                        UNSET
    topography_file:              /scratch/p/pjk/mudryk/cesm1_1_2_LENS/inputd...
    intake_esm_attrs/component:   atm
    intake_esm_attrs/frequency:   daily
    intake_esm_attrs/experiment:  20C
    intake_esm_dataset_key:       atm.20C.daily

Cc @kmpaul, @mgrover1, @matt-long... I'm going to merge this tomorrow unless there's any objection.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement Issues that are found to be a reasonable candidate feature additions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant