-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add a DerivedCatalog object to deal with derived variables #357
Comments
I took a stab at this. My current approach is similar to Matt's in that I'm keeping track of derived variable's info in a registry attached to the Initially this derivedcat registry is empty In [1]: import intake, intake_esm
In [2]: cat = intake.open_esm_datastore("./tests/sample-collections/catalog-dict-records.json")
In [4]: cat.unique()
Out[4]:
component [atm]
frequency [daily]
experiment [20C]
variable [FLNS, FLNSC, FLUT, FSNS, FSNSC]
path [s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS...
derived_variable []
dtype: object The user can register their derivation function via a decorator. In [5]: @intake_esm.register_derived_variable(varname="FOO", required=[{'variable': "TEMP", "component": "ocn"}])
...: def func(ds):
...: return ds.TEMP + 1
...: The user should be able to validate the derived catalog whenever they want via In [9]: cat.validate_derivedcat()
Looks good! This validation method looks like for key, entry in self.derivedcat.items():
for req in entry.required:
for col in req:
if col not in self.esmcat.df.columns:
raise ValueError(
f"{key} requires {col} to be in the ESM catalog columns: {self.esmcat.df.columns.tolist()}"
)
if self.esmcat.aggregation_control.variable_column_name not in req.keys():
raise ValueError(
f"Variable derivation requires *{self.esmcat.aggregation_control.variable_column_name}* to be in the dictionary of requirements: {req}"
)
else:
print('Looks good!') Operations like In [6]: cat.unique()
Out[6]:
component [atm]
frequency [daily]
experiment [20C]
variable [FLNS, FLNSC, FLUT, FSNS, FSNSC]
path [s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS...
derived_variable [FOO]
dtype: object
In [8]: cat.derivedcat
Out[8]: {'FOO': DerivedVariable(func=<function func at 0x1072dc310>, required=[{'variable': 'TEMP', 'component': 'ocn'}])}
Cc @matt-long, @kmpaul, @mgrover1 |
Similar to the development in esds-funnel, we think it would be useful to be able to add "derived variables" to a catalog, accessible via an api similar to this:
The text was updated successfully, but these errors were encountered: