Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Derived Catalog: test for all needed variables and skip if existing #441

Merged
merged 2 commits into from
Feb 2, 2022

Conversation

aulemahal
Copy link
Contributor

@aulemahal aulemahal commented Feb 1, 2022

Change Summary

When updating datasets with derived variable calculations, the current code calls the derived variable as soon as at least one dependency is present in the dataset. It also overwrites variables if they are already present.

This PR ensures that all dependencies are present before computing the derived cat. What would have resulted in an error before now is silently skipped.

This PR also skips the calculation if a variable of the same name already exists. The use case here is when a variable is missing on some datasets but not others. For example "tas", it's better to use a native "tas" than one derived from "tasmin" and "tasmax".

Related issue number

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable

@aulemahal aulemahal requested a review from andersy005 as a code owner February 1, 2022 21:53
@aulemahal aulemahal changed the title Test for all needed variables and skip if existing Derived Catalog : test for all needed variables and skip if existing Feb 1, 2022
@andersy005 andersy005 added the enhancement Issues that are found to be a reasonable candidate feature additions label Feb 1, 2022
@andersy005
Copy link
Member

This PR also skips the calculation if a variable of the same name already exists.

I'm curious... what should happen if the user wants to override the existing variable from the catalog i.e. they want to use their own derived variable?

@aulemahal
Copy link
Contributor Author

Well, you're right that this wouldn't work anymore, and some might see that as a problem.
My view was that the derived catalog is there to compensate for missing variables, rather than to change their definition. Thus the change felt justified (and is useful is my use of intake-esm). However, this is only my view. If you feel the previous behaviour was better, I can remove the added line.

@andersy005
Copy link
Member

Thus the change felt justified (and is useful is my use of intake-esm).

I agree with you.

However, this is only my view. If you feel the previous behaviour was better, I can remove the added line.

How about we keep your changes as the default behavior and maybe allow users to override this behavior via a flag/argument passed to the Registry.register()?

def register(
self,
func: typing.Callable,
*,
variable: str,
query: typing.Dict[pydantic.StrictStr, typing.Union[typing.Any, typing.List[typing.Any]]],

I imagine an argument along these lines

 prefer_derived : bool, optional (default=False)
           Specify whether to use a "derived variable" regardless if 
           a variable by the same name is found in the catalog.

Let me know what you think... 👍🏽 for merging this as is and addressing my suggestion in future PR if need be.

@aulemahal
Copy link
Contributor Author

I agree with your idea! Added in upcoming commit.

@andersy005
Copy link
Member

Awesome!! Thank you!

@andersy005 andersy005 changed the title Derived Catalog : test for all needed variables and skip if existing Derived Catalog: test for all needed variables and skip if existing Feb 2, 2022
@andersy005 andersy005 merged commit 65089aa into intake:main Feb 2, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement Issues that are found to be a reasonable candidate feature additions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants