-
Notifications
You must be signed in to change notification settings - Fork 0
Use Case: Intercomparison
Implementation: #4
It is quite common for scientists to wish to compare multiple datasets that measure related quantities. There is a huge variety of methods for doing this, but a great deal of value can be gained from implementing a few simple cases. These include comparing grids with other grids, and comparing grids with profiles, timeseries etc.
Use cases include comparing EN3 hydrographic or CEH soil moisture profiles with model data, highlighting areas of greatest discrepancy.
The general pattern for an intercomparison task would look like this:
- Identify two datasets for comparison
- Decide which variables in each dataset are to be compared (potential matches could be auto-suggested based on metadata)
- Select a spatiotemporal region over which the intercomparison will take place
- Select an algorithm/statistic
- Plot the results
Examples are:
- Subtracting one gridded field from another, resulting in another gridded field, or "difference field" (one of the fields may have to be regridded to match the other)
- Producing statistics of the difference field (e.g. histogram of the field)
- Comparing models with in situ observations. Usually this involves:
- Iterate over each observation
- Extract a "pseudo observation" from the grid at the same location as the observation
- Compute a statistic that compares the observation with the pseudo-observation (e.g. root mean square error)
- Plot the value of the statistic at the location of the original observation
I envisage that the system would suggest appropriate algorithms/statistics based on the types/geometries of the coverages involved.
An example of comparing EN3 data with model data can be found in the "OceanDIVA" application. Unfortunately the demo for this is no longer live but there's a slide in this presentation and a more detailed description in this paper. (We would not need to implement all the functionality of OceanDIVA in MELODIES!)
We will restrict this use case for now to the third example above, since this is relevant to multiple partners. In particular, soil moisture profiles shall be compared to a model grid. This means there is a 4D model grid (lon-lat-depth-time) and a collection of vertical profiles where each profile has a fixed time and location, and varies over depth.
There should be a time slider which controls the model time step. Any profiles within a certain range around this time step should be displayed as coloured points. The model may be displayed if there is a slider for the depth. The point colour should either represent the measured value (if a depth is selected), or indicate the RMS error for the whole depth range compared to a "virtual profile" extracted at that location from the model grid. Clicking on a profile point should display a plot of the "virtual profile" plus the in-situ profile, together with more detailed RMS statistics.
An alternative to compare the data would be to fix to a certain depth and when clicking on a profile point a time series plot of the model together with all profiles (constrained to the given depth) in the closer area is displayed. This however seems more challenging and should only be done if time allows.
Data volume can be an issue, particularly with model data. It's not clear that you would want to load a multidimensional model field into the browser and do the comparison client-side.
Getting the user interface right is also problematic. In particular, if you want to select two gridded fields for comparison, you can't plot them with one overlain on the other, as they are both (generally) opaque. However, when comparing a grid with point observations it's easier to display them both at the same time.
Linked Data techniques could be used to:
- Discover remote datasets that could be used in a comparison (perhaps a previous user has created an annotation that marks the two datasets as somehow related)
- Automatically suggest matching (or comparable) variables from each dataset, based on their metadata (such as looking for related observed properties)
- Results of intercomparisons could be recorded as annotations and saved in a system like CHARMe as Linked Data.
- Maybe suitable algorithms/statistics could be suggested based on metadata such as observedProperty (if we know that certain algorithms are commonly applied to certain kinds of variables.)