-
Notifications
You must be signed in to change notification settings - Fork 0
2024 CTSM PPE Meeting Notes
Agenda: Results from the new CLM6 PPE and an update on the history matching process.
Notes:
- Adrianna: questions from FATES AGU PPE poster
- training/testing emulators on separate forcing datasets (model verification vs. validation): is this a fair test? Charlie: If we select a parameter set that works across forcing datasets is it climate-invariant and thus introducing a new bias? Don't want to calibrate the model to an overly specific climate, also want to think about eventual coupling to an atmosphere.
- getting good parameter sets from PPE instead of inflated ensemble: just lucky? structural error complicates this. sometimes emulator error is low
- Linnia: offline tuning in coupled model results
- goal: bring down high LAI bias in Amazon
- tuned land parameter set in F-case looks reasonable, but in B-case looks bad
- what happened? bringing down LAI via WUE and increasing ET, introducing behavior where Amazon is dying off
- B case compared to CRU has drying effect in Amazon
- If LAI had been tuned differently e.g., thru carbon/nitrogen, would we see this? Testing it now.
- idea: run an F-case with B-case SSTs? (Charlie)
- Abby: dry Amazon bias is also not new? and is problem across models; Will: yes, but CESM bias has gotten worse
- Rosie: using an SP case here with reasonable Amazon fluxes? this bias is really problematic for terrestrial carbon cycle and how do we determine prioritization for addressing biases.
- Ben: we're not in control of all the levers but can we do some selected offline tests? role of land surface in modulating rainfall
- Abby: Claire's runs have slab ocean so somewhat in between F and B cases
- Rosie: what's worse? high LAI bias offline, or dead Amazon online? How do we tune the climate model when the background state is changing? Want parameterized forest in the same climate sensitivity space as the real world. Ensemble of simulations with rainfall perturbations to CRU?
- Linnia: Gradient in sensitivity via metrics?
- Ben: sample intermediate steps between F and B cases, is the response linear?
- Observational uncertainty
- how do we account for defining this? can we use an arbitrary percentage as with the parameter ranges? (JULES does this) but maybe not both directions
- Ben: need to inflate uncertainty because it becomes a proxy for structural error
- GPP and ET tradeoffs pushing the model in different directions
- Abby: Most ET datasets are dependent on each other (Penman-Monteith)
- Daniel: imposing ecological rules on parameter spaces
- think about the parameters we're perturbing here
- History matching
- 56 parameters, large spread
- wave 1 looking at mean state of LAI, GPP, ET, biomass
- are these the right variables? also constrained by obs
- MODIS-derived biomes that match up with PFT distributions
- 90% train, 10% validation: could we play around with the years sampled for time mean metrics? or the regions?
- wave 2 can use additional metrics and/or variables (e.g., albedo)
- also intermediate test with CESM so our tuning doesn't hit the model too hard
- CTSM6 LHC ensemble results
- spread is smaller than CLM5 LHC
- ensemble members are less of a carbon source in CTSM6 vs. CLM5
- question about spinup and when to start
Agenda: CLM6-PPE status update
- CLM6 PPE status update (5min)
- Evaluating spin-up protocol (5min)
- Retraining the sparsegrid (25min)
- Parameter selection & range revisions (15min)
NOTES:
- Status update
- Check out github discussions for asynchronous discussion of parameters and variables for the next PPE
- Working towards CLM6 LHC ~November timeline
- Spin-up protocol
- What is a good number / standard criteria for pSASU spinup? 0.01-0.02 PgC/yr, less than 3-5% gridcells changing 1g/m^2/yr (CLM5 paper, Keith's script)
- Plan: double spin-up (40 yrs AD, 160 yrs SASU) and see what that gets us
- TOTECOSYSC not as spun-up according to <1gC/m^2/yr metric
- Bug: transitions are jumping around, not initializing - will make an issue and work on solving this
- Sparse grid
- New code base, new forcing dataset: old clustering is still working surprisingly well
- Do we need to recluster, and should we use the same algorithm?
- One reason to recluster would be PFT/biome representativeness
- Current cluster based on h0 (gridcell) data but like to analyze based on h1 (pft) data
- We can also do something post-hoc to account for reclassifying based on distance metrics, but can't change the centroids already chosen (i.e., the sparse grid gridcells)
- Patch clustering complications: better way to choose patches? use pft number as a label for clustering alg? is 2deg the right starting point? is there a way to account for both h0 and h1 data needs?
- Benefit of h1 clustering is obtaining PFT levers for calibration, but our obs targets are at gridcell level (case for h0)
- What about competition and how that maps onto model gridcells?
- Using proportion deciduous or proportion woody in cluster algorithm? We have not changed the feature set.
- Selecting parameters
- 72 parameters in "mini-OAAT", now have 58
- This OAAT was run in transient mode (vs. CLM5 OAAT in equilibrium)
- Distribution of parameters among model physics is uneven, but that is expected/ok
- Radiation
- parameters aren't independent
- idea: sample from species within PFT? reduces from 8 to 1 free parameter, but may be hard to emulate
- we want radiation parameters to have levers on albedo
- is tuning all the radiation parameters interesting/useful? better to do in SP mode?
- idea: use stem parameters only, assume leaf parameters are fixed (assume it's been calibrated)
- what's the tuning goal? that will help inform the parameter selection/experimental design
- what are the observational targets here? soil color?
- Decomposition
- wide ranges for these parameters [0.2, 0.8], found in OAAT that the model did ok
- nitrogen mineralization rates (
nmin
) is another variable to check, or integrated FUN costs (npp_n_uptake
)
- TPU
-
tpu25ratio
is top parameter lever on NPP - are we comfortable with this? - model sensitivity to TPU, and the question about whether it should be in the model
- change the range and test? effectively turn it off?
- interactions between LUNA perturbations and when TPU becomes limiting
-
Agenda:
- Linnia will present the latest on the upcoming CLM6 tuning experiments
- Timeline and CLM6 status updates
- Parameter selection
- Retraining the sparsegrid
CLM6-PPE parameter list: https://docs.google.com/spreadsheets/d/1R0AybNR0YAmMDjRqp9oyUffDhKeAWv1QF4yWTHqiXXM/edit?usp=sharing
Notes:
-
Timeline and updates
- On track getting CLM6 PPE running, working with SEs
- Adrianna is helping with mesh files and datm subsetting, performance is much slower using full grid datm (vs. sparse grid)
- Thanks to Sean, now have a script that subsets datm file for GSWP3, working on CRU data now
- Ideally develop a tool that inputs the sparse gridcells you want and outputs the mesh file, since you would need to repeat this step every time you want to use a different sparse grid
- Using list of constraints from previous meeting, with OAAT to identify parameters
- Planning on a new mini-OAAT with CLM6, working towards a new LHC (~1500 members) with CLM6 and a new parameter list using several rounds of history matching for calibration
-
Parameter selection:
- Thinking beyond LAI; goals include tuning and UQ
- Easier to start with a larger set of parameters than it is to add later
- Running with crop model off, Sam will help with crop model tuning separately
-
tpu25ratio
: the fact that it has an effect (irrespective of range) is interesting (Rosie), some evidence for taking TPU out of land models (Charlie), only has a parameter effect at elevated CO2 (Daniel), can also ask Danica - Acclimation: might not want to treat all params as independent, can we span the overall range and sensitivity of acclimation by designing a specific sampling routine; but there is a question on why we pre-select these relationships - is it actually needed (Ben), Rosie has a spreadsheet that might allow us to play with these params independently, is it safer to treat them independently first vs. trying to understand those relationships later - with some computational tradeoffs (Daniel), benefit to reducing the dimensionality (Linnia)
- Phenology: need metrics to assess importance of these params, other params we didn't include in CLM5 OAAT?; soilpsi is not linear so be mindful of +/- percentages (Peter), be mindful of sparse grid limitations for these params (Will/Rosie), could try slope of LAI (Adrianna); maybe do a mini OAAT to determine whether phenology should be included in CLM6 calibration
- FUN parameters are already perturbed together as KCN and could use that as a code example for combining acclimation parameters, be mindful of FUN bug in CLM5.1 - will include KCN in mini OAAT for that reason
- Hydrology: multiple ways to perturb pedotransfer function parameters, makes sense to perturb them independently; what are the tuning goals for these parameters (Daniel), relates to parameter range definition and spatial dependence of parameter values (Guoqiang), may want to explore how to sample the parameter distributions (Daniel)
-
maximum_leaf_wetted_fraction
was found to be important in land-atmosphere interactions (Meg), it is the maximum value fwet can take (Sean) - Snow:
n_melt_coef
is useful and range could be expanded (Guoqiang), SNICAR stuff was useful for Arctic simulations (Will),zsno
probably only useful for coupled simulations and was in Claire's ensemble - Decomposition: likely some groupings here like respiration fraction litter to SOM and SOM to SOM params and check on ranges (Will), check
q10_mr
variable name in code (Charlie), watch for making fractions too low (Charlie), check on new CLM6 coarse woody debris parameterrf_cwd
(Will),minpsi_hr
andmaxpsi_hr
could be influential (Charlie), think about running bundled params separately for OAAT, consider independence of respiration fraction params (Katie R.)
Linnia & Daniel
Discussion of experimental design for tuning CLM6
Objective 1: Tuning LAI in CLM6 for CESM3
Objective 2: Carbon cycle uncertainty
Joint with CLM Meeting [10 people in room, 19 on call]
Announcements
- Parameter estimation interest group: check the slack for discussion on the future of the group.
- CESM Workshop: June 10-13, including Land and Biogeochemistry Model Working Group sessions and a Machine Learning Cross-Working Group session. Register for in person participation by May 31 and online participation through the meeting.
- Workshop on Model Uncertainty for Weather and Climate Prediction, University of Oxford, 23-26 September 2024. Register your interest and submit abstracts by June 23.
Agenda
- Kachinga Silwimba (PhD student at Boise State Univ., ASP GVP visitor through end of May) presents his work on "History Matching with Gaussian Processes and Evidential Deep Neural Networks to Improve Total Water Storage Simulations in the Community Land Model"
Notes
CLM-PPE analysis for hydrological applications
- Tuning CLM Total Water Storage (TWS) to GRACE satellite anomalies
First objectives:
- Develop methodology for emulating timeseries and seasonality
- Test and compare multiple ML methods for emulating CLM
- Perform sensitivity tests (Fourier amplitude & Sobol sensitivity tests)
- Using the emulator to perform the sensitivity tests.
Longer term objective: Once an emulator is built and evaluated, the next step will be to perform history matching (in progress)
- GRACE TWS will be used as observational target data set.
Step 1: Augment the dataset to emulate timeseries (specifically seasonality) of TWS
- PPE dataset was augmented to incorporate time
- Training data:
- X: input array (500 parameters * 100 years * 12 month = 600,000 )
- include month and year as predictors
- use cyclic encoding of month (to ensure December and January are "close together" rather than 12 and 1)
- y: monthly TWS (1901-2000)
- X: input array (500 parameters * 100 years * 12 month = 600,000 )
- Validation data: (2001-2014 montly TWS)
Step 2: Compare three different tools for emulation, Gaussian process, Depp Neural Network, and Evidential Neural Network
- Gaussian Processes: -Gaussian process emulators do not scale well with larger datasets (impossible to train on 600k training data) -Trained GP to emulate the annual mean TWS. -Fourier Amplitude Sensitivity test (performed with GP) showed the most influential parameters were related to ET and soil moisture (fff, d_max) as expected.
DNN
- Performs well and shows high skill for predicting seasonality (2001-2014)
- Performed Fourier amplitude sensitivity test with DNN (same parameters are important as with GP)
Evidential DNN : "best of both GP and DNN" scales well with large training dataset and provides an estimate of uncertainty in predictions.
- validation looks good
- partitions aleatoric and epistemic uncertainty
- Epistemic is larger (as expected)
- Sensitivity tests with EDNN show similar results to DNN and GP
Summary: Sobol and FAST sensitivity analysis show similar results (regardless of emulator used)
- indicates robustness of sensitivity tests
Emulation of timeseries (seasonality) is working well with DNN and Evidential DNN
Evidential Deep Neural Network is a promising tool for PPE emulation
- scales well with large datasets
- provides uncertainty estimates in predictions (critical for history matching)
- partitions uncertainty into aleatoric and epistemic
Next steps:
- Emulate the CLM-PPE TWS anomalies and evaluate them with GRACE
- Incorporate space (in addition to time) to emulation.
- Perform history matching.
Discussion:
- Can you apply the SOBOL test to EDNN uncertainty?
- maybe, something to try
- Why doesn't GP work for emulating timeseries?
- dataset is too large (compute scales cubically with the input data size)
- Compare the uncertainty estimates from GP to EDNN
- GP uncertainty is sensitive to choice of kernel but a comparison would be useful.
- Evidential DNN captures expected epistemic uncertainty relative to magnitude of aleatoric
- Epistemic is a combination of uncertainty in hyperparameter tuning and sparse PPE sample
- Epistemic could be reduced by adding more PPE ensemble members
- Quantile transform may be useful when incorporating space.
[4 people in room, 16 on call]
Announcements
-
Parameter estimation interest group talks in May:
- May 1st, 9am MT 2024: Oliver Dunbar, Environmental Science and Engineering, California Institute of Technology
- May 8th, 9am MT 2024: Anthony Bloom, JPL, California Institute of Technology. CARDAMOM: https://datashare.ed.ac.uk/handle/10283/864
Agenda
- Discussion of sparse/PFT grids and PFT interactions
- Sparse grid evaluation plots from ILAMB: https://www.ilamb.org/PPE/CLM/2021-02/
Notes
- Motivation: is the sparse grid still serving us?
- Also motivated by Adrianna's FATES work
- Large computational savings from sparse grid (14x faster than 2 degree)
- Why 400? Based on variable representativeness (relative to full grid) and what we could afford
- Use ILAMB plots to look at bias and tradeoffs with # gridcells
- Limitations
- Can be lacking PFT sampling
- Using default simulation for clustering
- Older code base (Note: updated initial condition files could be useful here: https://github.com/NCAR/LMWG_dev/issues/57)
- Current sparsegrid is fine for gridcell output, but now we are interested in PFT-level output
- Can we construct a similarity matrix at the patch level? Each PFT has its own similarity matrix?
- Would potentially increase # gridcells to run at each simulation
- Consider validation data as well, consider matching with obs
- Building an emulator could use idealized setup, then switch to more realistic setup for validation
- Alternatives
- Use current sparse grid
- Re-run clustering for PFT information
- Select gridcells for dominant PFTs
- Start with dominant and add co-dominant cells
- Full grid
- Idealized surface datasets
- Where do we put our uncertainty?
- Error in sparse grid propagates to calibration
- Error in PFT representation
- PFT interactions within gridcells for cluster analysis with PFT information - how do we account for this?
- Relationship between PFTs within a column
- Subset by common combinations of PFTs? Use those to seed the clustering
- Linnia: 1 PFT tends to co-exist with 1 or 2 other PFTs at most, so it seems like this is reasonable
- Daniel: run clustering on h1 data (patches), then you are selecting patches to run vs. selecting gridcells
- Adrianna: proportional area vs. dominance thresholds plots
- How can we pick different grids that map up to different PFTs?
- Defining dominance ecologically
- Defining co-dominance, secondary grids after running some initial simulations
- PFT interactions - might need additional simulations to diagnose these
- Iterative process of running clustering algorithm
[28 people on call]
Announcements
- Parameter estimation interest group has a few talks coming up, we will share the details with this group
Agenda
- Linnia presents updates on the CLM-PPE and discusses some of the implications of our experimental design.
Notes
- Wave 1 of history matching: focused on PFT x biome spatial aggregation, vary PFT parameters independently
- Some PFTs are harder to emulate / have greater emulator uncertainty
- Wave 0 (original LHC PPE) scaled PFT parameters uniformly
- Implausibility score rules out parts of the parameter space
- Ignoring structural uncertainty (for now)
- Sample from plausible sets: how best to do this?
- Goal is to reduce emulator uncertainty
- Use close to latin hypercube to select 100 members to re-run in CLM (Wave 1)
- Wave 1 results
- Observational uncertainty varies across biomes
- Emulator uncertainty is dominating
- Wave 1 from CLM falls generally within uncertainty band (obs + emulator uncertainty)
- Vary PFT parameters independently works ok
- Issues to address
- Uncertainty: observational, forcing, emulator
- Sparse grid: PFT interactions
- Metrics: trend, IAV, seasonality
- Emulator uncertainty
- Adding 100 ensemble members is not reducing emulator uncertainty - why?
- Wave 1 allowed PFT parameters to vary independently, which matters when PFTs share gridcells
- Needleleaf dominated gridcell: adding wave0 and wave1 ensemble members decreases variance
- Needleleaf/broadleaf split gridcell: adding wave1 ensembles members does NOT decrease variance
- Test adding NL LAI as a predictor: adding wave1 ensemble members decreases variance (Ben: how sensitive is the performance to adding LAI from which wave?)
- Rosie: should we just focus on dominated gridcells?
- Rethinking sparse grid
- Option 1: Select gridcells with one PFT dominating (FATES approach)
- Option 2: Select new sparse grid with PFT as a factor
- Sparse grid redesign likely important to sampling enough PFT gridcells
- Adding targets that may not be aggregated by PFT (e.g., GPP) - do the data exist somewhere? Gordon: GPP product has an underlying landcover classification, could calculate FLUXCOM GPP using CLM land surface dataset. Ben: useful variance information
- Andy: Try parameter selection sequentially based on PFTs in the gridcell? Conditional optimization in a sequence. Hydrology analogy would be calibrating snow first, fixing it, then calibrating soil parameters. (Adrianna is taking this approach. Learn more in dominant gridcells and then introduce a new grid partway through.) Linnia: Small parts of grass everywhere is challenging, keep it fixed?
- Adrianna: biome land area vs. "dominance threshold" for diagnosing spatial importance of a particular PFT. Each PFT would get its own set of gridcells (think about how many), consider observational uncertainty. [Great future discussion topic for this group!]
- Ben: spatial element to the calibration cascade, bringing in information in the data as well
- Aleya: what is a PFT interaction? All below ground interactions via moisture/nutrients because all PFTs in a gridcell share the same soil column
- Linnia: started looking at trend in LAI, Daniel has shown that LAI trend is an important constraint
- Andy: resampling from emulator results, different approaches to interpreting implausibility score
- Early waves may not want to focus the resampling too narrow (to reduce emulator uncertainty), but later waves could consider this. Also consider how observational uncertainty maps onto interpreting implausibility. Draw on hydrology community experience.
- How do wave0 training members impact emulators? Add more to wave1? Throw out the worst of wave0?
Announcements
- We have a google group / email list! Feel free to add yourself and share with others.
- CESM Land Model Working Group meeting is February 27-29, register here.
- Lots of PPE talks!
- Linnia is running a PPE tag on Derecho! Thanks to Erik, Keith, Sam L., Daniel for making this happen.
- Thinking about LAI calibration at the PFT-biome level, working towards calibration for CLM6.
- Adrianna is working on FATES calibration cascade.
Agenda
- Khachik Sargsyan (DOE Sandia) and Daniel Ricciuto (DOE Oak Ridge) present on "Reduced-Dimensional Neural Network Surrogate Construction and Calibration of the E3SM Land Model." Slides here.
Notes
- Forward UQ: global sensitivity analysis & inverse UQ: model calibration
- Reduced dimensional surrogate construction
- Using satellite phenology version of ELM, will work with BGC model next
- 2 degree resolution, 275 members, 10 parameters
- Karhunen-Loeve expansion: SVD type dimensionality reduction but centralized, continuous form
- linear encoder of the output
- solving for eigen-features
- uncertain parameters and "certain" conditions
- working with latent space
- they have tried autoencoders but not worth the effort, linear encoding is acceptable here
- neural network based surrogate, they do compare with GP but prefer combining outputs
- Evaluating at 96 FLUXNET sites, 180 months
- using different time averaging: monthly, monthly climo, seasonal climo, annual
- issues with memory and high dimensionality
- Retaining 8 eigenvalues (FLUXNET sites) and 11 (global) retaining most of the variance
- Recommend residual NN (ResNet)
- fLNR most sensitive parameter, mbbopt second most (Ball-Berry stomatal conductance slope)
- Reference data is FLUXCOM GPP
- Posterior sampling via MCMC
- Likelihood in reduced space: project observed data to KL eigenspace to calibrate
- Also looked at local (site-specific) parameter posterior PDFs - how to interpret?
- Correlate PFT fractions with best fLNR values
Questions
- Linear approximation w/ PFT-output? Using aggregated output for now as a "compression".
- Issues with GPP calibration making another variable worse? Could add latent heat flux to the methodology via vectorization.
- Informing structural issues? Embedded model error approach - augment Bayesian likelihood for model inadequacy. Internal approaches to add statistical representations - some preliminary work.
- Performance at different sites? Surrogate performance does vary based on site, haven't looked at calibration yet.
- Losing information from PCA - is KL method different? PCA does not centralize the data. Formally this is not PCA, it is more like SVD.
- Choice of eigenvectors based on default? A different eigenbasis would impact this workflow.
- Model is forced with reanalysis but also looking at FLUXNET sites? FLUXCOM is the target.
Additional Questions (follow-up)
- Comparing NN vs. GP - with CLM we have found separate emulators work well.
- We have compared with PC (polynomial chaos - see some of the additional slides)
- GP like PC would have to be built separately for each latent feature. Putting all outputs together might be a bit awkward, particularly if each GP is tuned to its own best hyperparameter setting (and tuning I feel might be required for high-d input space, e.g. 10 parameters).
- advantage of GP or PC is that the have a natural way of quantifying the emulator error, which NN does not have readily.
- Details on ResNet?
- We used standard Resnets with x_{i+1} = x_i+ F(x_i, w) being the equation from layer i to the layer (i+1). The additional 'shortcut' x_i makes difference! See e.g. https://www.ksargsyan.net/files/talks/2023_06_uncecomp.pdf
- Quantifying the NN emulator error?
- not really, but there is a whole industry on quantifying NN emulator errors with many methods like Laplace approximation, MC-dropout, variational inference etc... None of them are ideal or easy to train/interpret though.
- Sensitivity metric?
- Sobol sensitivity indices that capture variance fraction due to a given parameter.
- How is the model prior defined?
- Nothing tricky so far, just uniform prior for each parameter in a range driven by literature/expert knowledge.
- In all this, the elephant in the room is the model error - there is no ideal way to tackle it of course, but the embedded approach (https://www.dl.begellhouse.com/journals/52034eb04b657aea,5a3895a14afb242f,1d2810e66490c327.html) is what we have been trying with some success.