Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Investigate Xmip for preprocessing CMIP6 data prior to data treatment #138

Open
Zeitsperre opened this issue Jun 30, 2023 · 1 comment
Open
Assignees
Labels
data bug Bug with data cleaned by miranda enhancement New feature or request

Comments

@Zeitsperre
Copy link
Collaborator

Proposal

CMIP6 data sometimes requires additional cleaning or treatment to remove known issues with the data (e.g. extra weeks of data, specific errors in values/metadata, inconsistent naming of coordinates, etc.). Issues in our existing data stores of CMIP6 data are difficult to track, annoying to correct, and Miranda's existing data cleaning approach is ill-suited for handling these sparse errors.

While other tools should be explored for collecting CMIP6 data (such as esgpull), we shouldn't be trying to remake the wheel, especially for a project as large and well-supported like CMIP6.

Approach

Xmip should be leveraged for this step. This could be built into Miranda as another method or submodule specifically for preprocessing (miranda.preprocessing.cmip?).

Xmip provides a post-processing module that might be of interest to xscen for building scenarios. To be determined.

@Zeitsperre Zeitsperre added enhancement New feature or request data bug Bug with data cleaned by miranda labels Jun 30, 2023
@juliettelavoie
Copy link
Collaborator

Definitely a lot of interesting features in xmip!
I think a lot of the hard coded issues and fixes in pre-processing are for oceanography, so not variables/experiment that we use often. But, it makes sense to contribute to xmip and have miranda wrap it instead of doing it separatly directly in miranda.

For the post-processing, I think we already solve of a lot of the combination problems with extract_dataset and .to_dataset. I'm not convinced we should add it to xscen until we really need it.
We also already handle grids using xesmf.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
data bug Bug with data cleaned by miranda enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

2 participants