Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bug in modaradiation in case of warm start #40

Closed
jchylik opened this issue Mar 28, 2019 · 11 comments
Closed

Bug in modaradiation in case of warm start #40

jchylik opened this issue Mar 28, 2019 · 11 comments

Comments

@jchylik
Copy link

jchylik commented Mar 28, 2019

It has turned out that the bug is in the subroutine radiation in modradiation, line 204:

if((itimerad==0 .or. timee==tnext) .and. rk3step==1) then
   tnext = tnext+itimerad
   <calls to radiation schemes>

In the case of the warm start, the initial (modraddata) "tnext" is set lower than post-restart "timee". Therefore, (modraddata) "tnext" is never updated and radiation schemes are never computed. This leaves the model run only with 0 radiative tendency of large-scale radiate tendency.

The proposed quick fix is:

  if((itimerad==0 .or. timee>=tnext) .and. rk3step==1) then
   tnext = max(timee-dt_lim,tnext)+itimerad
  <calls to radiation schemes>

This will both:
a) set the tnext back on track in case of warm start in a consistent way
b) prevent running the radiation calculations btime/itimerad times

The other option is adding btime from modglobal to used variables, and adding lines:

if(tnext<btime) then   ! fix for btime after warm start
  tnext = btime+itimerad
end if  
if((itimerad==0 .or. timee>=tnext) .and. rk3step==1) then
   ...
@huugouwersloot
Copy link
Contributor

I would argue that the best solution is both easier and has more impact at the same time: in the read- and writerestartfiles (at least) tnext should be included as well.
In that case no further alterations would be needed (which would possibly result in altered calculations between 'normal' runs and restarted runs if the moment of the restart is not an integer amount times the period of the radiation cycle (as defined by itimerad))

@fjansson
Copy link
Contributor

fjansson commented Apr 1, 2019

I'd like it if all scheduled processes are handled in the same way. For example cross section netCDF writing has a similar scheduling mechanism, and works across restarts.
I think that one takes the restart time into account on line 127 of modcrosssection.f90, in initcrossection

idtav = dtav/tres
tnext   = idtav+btime

I also found such a line for radiation. Jan wrote in a mail "The timee from the restart files are read after all of the tnext variables has been set in their respective subroutines. ", and indeed startup calls initradiation before readinitfiles. How about moving initradiation down until after readinitfiles?

@jchylik
Copy link
Author

jchylik commented Apr 1, 2019

Unfortunately, putting initradiation after readrestartfiles would require many additional changes to the code. Some of the variables are written to restart files (and read) depending on the flag for radiations.

Adding tnext to restart files has also crossed my mind, but the issues is that the name is not unique. Module-specific variables called tnext appears in at least three different modules.
What is the DALES code convention for naming the included variables?
use modraddata, only: iradiation, useMcICA, tnext_raddata=> tnext

@fjansson
Copy link
Contributor

fjansson commented Apr 1, 2019

I see, you're right. At the end of readinitfiles, tnextrestart is updated with the new btime value: tnextrestart = btime + itrestart. We could do the same for tnext from modraddata. (Importing as you say, with renaming, seems fine with me.)

On the other hand, I start agreeing with Huug's point about trestart not necessarily being an integer multiple of the radiation interval. Also, there is (or used to be?) a mechanism to request a restart at any time by placing a special file in the working directory. Such a restart will then probably not occur on an integer multiple. Then the radiation interval will be different with a restart, and the restarted run will not be bitwise identical to a non-restarted run. I support solving this by adding tnext from radiation to the restart file. Should we then on the longer term save all the other tnexts as well, and aim for bit-perfect restarts at any time?

@huugouwersloot
Copy link
Contributor

Indeed, I think it would be worthwile to make an inventory of all subroutines where this time-dependence is of importance. I could imagine it's mainly important for the prognostic subroutines (so that the simulations are bitwise continuous); for the diagnostic subroutines it might be less important, although differences there could make it harder to compare results between numerical experiments.
I propose to make this inventory based on the version 4.2 that is to be released (in order not to further delay that release) and fix it in the development branch towards version 4.3 in an early stage.

@fjansson
Copy link
Contributor

fjansson commented Apr 3, 2019

I agree.

One more point for the radiation, from Stephan: currently a warm start must call the radiation routine on the first step. I think this is because the tendencies from radiation must be calculated there, because they are not stored in the save file (radiation tendencies seem to be stored only when MCICA is on). As long as this is the case, warm starts are not guaranteed to be bit-wise exact anyway.

So for 4.2 we fix restarts with radiation and ensure radiation is called on the first step. We live with the fact that restarts aren't bitwise exact always. For 4.3 we re-think the time scheduling and save files?

@huugouwersloot
Copy link
Contributor

I think it would be worthwhile to make an inventory of the routines that are 'losing' data in a restart and/or are handled in such a way that a restart could yield results that are not (bit-wise) identical.
These routines can then be adapted to make sure that making use of restarts does not affect the results of numerical experiments. In my opinion, indeed it would be great to apply the necessary changes in version 4.3.
Also for testbed purposes, this cleanup of the code would be relevant.

@jchylik
Copy link
Author

jchylik commented Apr 16, 2019

Indeed

I think it would be worthwhile to make an inventory of the routines that are 'losing' data in a restart and/or are handled in such a way that a restart could yield results that are not (bit-wise) identical.
These routines can then be adapted to make sure that making use of restarts does not affect the results of numerical experiments. In my opinion, indeed it would be great to apply the necessary changes in version 4.3.
Also for testbed purposes, this cleanup of the code would be relevant.

The only question remains how to keep the variable names consistent when adding more modules. So far I am considering option:

  tnext_<MODULE_NAME>=> tnext

@fjansson
Copy link
Contributor

I'm working on a fix for the radiation & restart bug. Currently, I'm in favour of saving tnext for radiation and thlprad, the temperature tendency from radiation, in the init files.

It seems runs with radiation are not going to be bitwise identical if restarted vs run at once anyway: both rrtmg and radfull use the current pressure profile at initialization,
and pressure is time dependent (modradrrtmg.f90, line 133)

fjansson pushed a commit that referenced this issue Feb 13, 2020
Radiation would not be performed after restarts at all, because tnext
was initialized before the start time of the simulation (btime) was
set.  Now, tnext for radiation is set to btime at the end of
modstartup, so that a radiation call is made at the very start of the
simulation.

For now I decided against trying to save the output from the radiation
routine in the init files - because of interaction with surface and
chemistry schemes, not just the temperature tendency but also other
radiation quantities need to be saved.

With radiation, restarts are still not bitwise exact. The
initialization of the radiation schemes, at least RRTMG and radfull,
use the current pressure profile, which is temperature dependent. This
causes small differences when the radiation scheme is re-initialized
at a warm start.

If bitwise exact restarts are important, it may be enough to save the
initial pressure profiles.
@fjansson
Copy link
Contributor

Proposed fix in commit 66d81a3, in the to4.3_Fredrik branch.

Radiation would not be performed after restarts at all, because tnext was initialized before the start time of the simulation (btime) was set. Now, tnext for radiation is set to btime at the end of modstartup, so that a radiation call is made at the very start of the
simulation.

For now I decided against trying to save the output from the radiation routine in the init files - because of interaction with surface and chemistry schemes, not just the temperature tendency but also other radiation quantities need to be saved.

With radiation, restarts are still not bitwise exact. The initialization of the radiation schemes, at least RRTMG and radfull, use the current pressure profile, which is temperature dependent. This causes small differences when the radiation scheme is re-initialized at a warm start.

If bitwise exact restarts are important, it may be enough to save the initial pressure profiles.

@fjansson
Copy link
Contributor

The fix above was applied in branch to4.3.

  • save all radiation output fields in the init files
  • save tnext for radiation in the init files
  • rrtmg: initialize using pressure profiles from the beginning of the simulation,
    to ensure the internal radiation tables are created identically at warm start

With this fix, a run with rrtmg radiation can be restarted in a bitwise exact way.
(radfull still does not restart exactly.)

Leaving issue open since radfull restart is not bitwise exact yet.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants