Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Introduce prognostic updraft velocity to saSAS and C3 convection schemes #2567

Open
wants to merge 24 commits into
base: develop
Choose a base branch
from

Conversation

lisa-bengtsson
Copy link
Contributor

@lisa-bengtsson lisa-bengtsson commented Jan 23, 2025

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This development incorporates a new prognostic updraft velocity equation into the saSAS deep and shallow convection schemes. It is introduced as a module and will be easily added to the C3 scheme as well, once the modularized C3 code is in place.

The equation describes the time-evolution of convective updraft velocity, with buoyancy as a source term, and quadratic damping is due to aerodynamic drag and mixing/diffusion and linear damping from wind shear.

The equation is discretized using an implicit approach, and solved as a quadratic equation. For implementation details - please see: https://docs.google.com/document/d/13VH7DV4erJcuuF_-dUGpplk9uYHF6dalfx8l1fAIxe8/edit?tab=t.0

The prognostic updraft velocity is used in the prognostic closure if progsigma = true and progomega = true, it is also replacing the diagnostic updraft velocity in the adjustment time-scale computation if progomega = true (regardless of progsigma). I here implement the scheme using the default setting of progomega = false for further testing and evaluation, and thus no regression tests should fail and no new baselines are required.

Commit Message:

* UFSWM - 
  *
  * FV3
    * ccpp-physics

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

New baselines needed for the following tests:

cpld_control_p8_faster intel
control_p8_faster intel
hafs_regional_storm_following_1nest_atm_ocn_wav intel
hafs_regional_storm_following_1nest_atm_ocn_wav_inline intel

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • GaeaC5
    • GaeaC6
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@lisa-bengtsson
Copy link
Contributor Author

@jkbk2004 I keep seeing failures in these four tests, even though you can successfully run them with this PR:
cpld_control_p8_faster intel
control_p8_faster intel
hafs_regional_storm_following_1nest_atm_ocn_wav intel
hafs_regional_storm_following_1nest_atm_ocn_wav_inline intel

I don't know how to upload a test_changes.list and move forward with this PR, if you have time - could you help me with the regression tests? Thank you.

@grantfirl
Copy link
Collaborator

@jkbk2004 I keep seeing failures in these four tests, even though you can successfully run them with this PR: cpld_control_p8_faster intel control_p8_faster intel hafs_regional_storm_following_1nest_atm_ocn_wav intel hafs_regional_storm_following_1nest_atm_ocn_wav_inline intel

I don't know how to upload a test_changes.list and move forward with this PR, if you have time - could you help me with the regression tests? Thank you.

@lisa-bengtsson I can try to do this on Hera if you'd like.

@lisa-bengtsson
Copy link
Contributor Author

@grantfirl thank you, that would be much appreciated!

@grantfirl
Copy link
Collaborator

@grantfirl thank you, that would be much appreciated!

OK, RTs running on Hera right now. FYI, I did need to bring in the latest develop branches to your fv3atm and ufs-weather-model branches. Perhaps that was the issue? I'll let you know when it's finished.

@lisa-bengtsson
Copy link
Contributor Author

@grantfirl thanks, I did update them now, and will try again. I've had some trouble with the same four cases for a couple of weeks, @jkbk2004 ran them earlier and confirmed that they passed - so I don't know why I'm having trouble with them on my end. I'd be interested to see if your RT suite passes.

@grantfirl
Copy link
Collaborator

grantfirl commented Feb 28, 2025

@grantfirl thanks, I did update them now, and will try again. I've had some trouble with the same four cases for a couple of weeks, @jkbk2004 ran them earlier and confirmed that they passed - so I don't know why I'm having trouble with them on my end. I'd be interested to see if your RT suite passes.

@lisa-bengtsson I'm seeing the same failures. The runs completed, but the results were not identical.

@grantfirl
Copy link
Collaborator

@grantfirl thanks, I did update them now, and will try again. I've had some trouble with the same four cases for a couple of weeks, @jkbk2004 ran them earlier and confirmed that they passed - so I don't know why I'm having trouble with them on my end. I'd be interested to see if your RT suite passes.

@lisa-bengtsson If you really don't think that those tests should produce different results, it's possible that the baselines on Hera for those tests are wrong. As soon as the entire rt.conf finishes up, I'll checkout the top-of-develop and run only one of the tests that is failing to see if the baselines are OK.

@grantfirl
Copy link
Collaborator

@lisa-bengtsson So, the develop branch doesn't fail at least for the control_p8_faster intel test which means that the baselines are probably OK. The way I see it, there are a couple of hypotheses:

  1. All of the tests that are failing have the -DFASTER=ON cmake flag added which increases optimization and adds some other flags. This is detailed in this PR. It would probably be good for someone who understands the impact of these flags (at least @SamuelTrahanNOAA) to review this and submodule PRs to see if any of the code changes introduced would trigger a result change due to the -DFASTER flags.
  2. I wonder if the tests are not reproducing and if it is at all related to the recent PR: Minor modification and bug fix in GFS cumulus convection schemes #2487. @rhaesung was able to debug and find the culprit for the reproducibility issue. I think that the changes were confined to Minor modification and bug fix in GFS cumulus convection schemes ccpp-physics#232. It might be worth a look to see if similar changes need to be made in this PR to have reproducible results.

Anyway, these are my best 2 hypotheses at the moment. One or both could be wrong!

@SamuelTrahanNOAA
Copy link
Collaborator

The most common cause is numbers that are close to, but not exactly, zero. It can cause problems when optimization is changed to round subnormal (AKA denormal) numbers to zero. It also causes problems when switching to single-precision in physics. A good way to detect problems like this is -DDEBUG=YES -DCCPP_32BIT=YES.

Look for:

  1. Comparisons to zero. You should use epsilons instead: if(abs(y) < 1e-20) instead of if(y==0).
  2. Diving by a number that is likely to be nearly zero. Instead, set a lower bound and do something like this: y = 1 / max(1e-9, x) instead of y = 1 / x
  3. Calling math functions and passing arguments that are nearly zero. Solve this similarly to item 2: y = log(max(1e-9, x)) instead of y = log(x)
  4. Literal constants that would be hard to represent in floating-point, like 1e-58. This is especially problematic in single-precision physics since some literal constants literally cannot be represented at that precision.

@lisa-bengtsson
Copy link
Contributor Author

@grantfirl @SamuelTrahanNOAA Thanks, I will take a look.

Grant, did your test of this PR also fail in the control_p8_faster intel test? The reason I'm asking is that I had this problem before, but then @jkbk2004 ran the RT's using this PR and the tests passed.

@grantfirl
Copy link
Collaborator

@grantfirl @SamuelTrahanNOAA Thanks, I will take a look.

Grant, did your test of this PR also fail in the control_p8_faster intel test? The reason I'm asking is that I had this problem before, but then @jkbk2004 ran the RT's using this PR and the tests passed.

Yes, my test also had failures for:
cpld_control_p8_faster intel
control_p8_faster intel
hafs_regional_storm_following_1nest_atm_ocn_wav intel
hafs_regional_storm_following_1nest_atm_ocn_wav_inline intel

@lisa-bengtsson
Copy link
Contributor Author

Ok, thanks - let me see if I can understand it better following Sam's clues.

@lisa-bengtsson
Copy link
Contributor Author

@SamuelTrahanNOAA Do you think it could be possible that the RT's does not reproduce due to code inside the condition "if progomega" even if progomega is false in all the tests that fail? There aren't many lines of code outside of this statement that were updated.

@lisa-bengtsson
Copy link
Contributor Author

An update is that the options -DDEBUG=YES -DCCPP_32BIT=YES doesn't work for the coupled tests as compilation of MOM6 fails. I tried instead of running the coupled_debug_p8 as an atmosphere only test, adding in the physics tested here for gfsv17 and compiling with -DDEBUG=YES -DCCPP_32BIT=YES. It compiles and runs without problems. So unfortunately this option didn't provide any additional information as to why the comparison with basline fails when using the "DFASTER" option.

@SamuelTrahanNOAA
Copy link
Collaborator

Perhaps I'm misunderstanding.

@grantfirl - You said the test "failed." What do you mean by that? Is it crashing? Or does it run to completion and produce different results?

If it's running to completion and producing different results, that's expected when there are changes to source code files used by the test. Changing the code can change the optimizations used in that code. That's true even if the changes are hidden behind an "if" block. For example, something that was vectorized may no longer be vectorized due to a new branch.

The code shouldn't crash, though. All the tests should run to completion.

@lisa-bengtsson
Copy link
Contributor Author

lisa-bengtsson commented Mar 5, 2025

@SamuelTrahanNOAA no tests are crashing. But these tests fail in comparing the runs with the baselines:

cpld_control_p8_faster intel
control_p8_faster intel
hafs_regional_storm_following_1nest_atm_ocn_wav intel
hafs_regional_storm_following_1nest_atm_ocn_wav_inline intel

For comparison the tests cpld_control_p8 and cpld_debug_p8 are OK, it is just these jobs with the DFASTER flag that gives different results compared to the baselines.

@SamuelTrahanNOAA
Copy link
Collaborator

Is there some reason why those tests wouldn't change results?

If they run subroutines that you've changed, then results are expected to change. The -DFASTER=ON option enables higher levels of optimization. The compiler will be better at inlining and vectorizing, among other things. If you rearrange the code or otherwise modify it, then the optimizations the compiler can make in that code will change.

The only true way to confine the tests to your changes is to disable optimization entirely. Otherwise, you have to accept that changing code in a subroutine may change the output of any test that uses the subroutine.

@grantfirl
Copy link
Collaborator

@SamuelTrahanNOAA @lisa-bengtsson I think that the issue is that there have been MANY PRs that have come through that change physics source files in similar ways that don't trigger differences with the -DFASTER option. It's understood that this option increases optimization and therefore it's possible for result changes to happen due to this, it's just that we haven't noticed this much (if at all) in practice. Ultimately, it's up to the developer (@lisa-bengtsson) and us (code managers) to decide whether the test changes are acceptable or not, so, if we can't find a reason in the code changes that these 4 tests fail besides the change in optimization, this could still be acceptable. It might be nice to narrow down EXACTLY which code change lines lead to the difference though.

I think that it makes it harder to understand if the -DFASTER option is even the reason for differing results because there are more differences between the cpld_control_p8, cpld_debug_p8, and cpld_control_p8_faster tests than just using -DFASTER. I don't know if this is intentional or not.

@lisa-bengtsson
Copy link
Contributor Author

@grantfirl @SamuelTrahanNOAA thanks for explaining. Since the other tests pertaining to GFSv17 physics (or RT's using the samf* code) are bit-wise reproducible (including the debug tests), I think it supports Sam's argument that the differences pertain to optimization. I do change some intents, and add arguments as input/output.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants