Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Unable to read in CRTM coefficients, using inline post, while running GNU executables on Hera #2537

Open
MichaelLueken opened this issue Dec 13, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@MichaelLueken
Copy link
Collaborator

Description

While running the weather model on Hera using GNU-built executables, inline post is failing while using the postxconfig-NT-rrfs.txt post config file. This configuration uses simulated radiances, thus requiring to read in CRTM coefficients. The inline post is failing with the following error messages:

 Check_Binary_File(FAILURE) : Data file needs to be byte-swapped.
 Open_Binary_File(FAILURE) : Error checking imgr_g15.SpcCoeff.bin file byte order
 SpcCoeff_ReadFile(Binary)(FAILURE) : Error opening imgr_g15.SpcCoeff.bin
 CRTM_SpcCoeff_Load(FAILURE) : Error reading SpcCoeff file #1, imgr_g15.SpcCoeff.bin; Process ID: 0
 CRTM_Init(FAILURE) : Error loading SpcCoeff data; Process ID: 0
 ERROR*** crtm_init error_status=      3

Looking in cmake/GNU.cmake, I noted that there was no -fconvert=big-endian in the Fortran flags. Adding this flag allows the inline post to successfully run, but other tests failed in the weather model:

Error termination. Backtrace:
At line 4448 of file /scratch1/NCEPDEV/stmp2/Michael.Lueken/ufs-srweather-app/sorc/ufs-weather-model/FV3/ccpp/physics/physics/MP/Thompson/module_mp_thompson.F90 (unit = 63, file = 'qr_acr_qgV2.dat')
Fortran runtime error: End of file

Should inline post not be run using GNU-built executables?

To Reproduce:

The SRW App only runs GNU tests on Hera, but this would likely apply to all machines that can allow for GNU builds.

  1. Clone my feature/hash_update branch on Hera - git clone -b feature/hash_update git@github.com:MichaelLueken/ufs-srweather-app.git
  2. cd ufs-srweather-app
  3. Checkout externals - ./manage_externals/checkout_externals
  4. Build the GNU executables - ./devbuild.sh -p=hera -c=gnu
  5. module use modulefiles
  6. module load wflow_hera
  7. conda activate srw_app
  8. cd tests/WE2E
  9. Run the fundamental WE2E test suite using ./run_WE2E_tests.py -t fundamental -m hera -a <account> -c gnu (replace with your project on Hera) and see the inline post test fail with the above error message in ../../../expt_dirs/grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta/log/run_fcst_mem000_2020081000.log
@DusanJovic-NOAA
Copy link
Collaborator

Inline post should run using the GNU compiler.

Since the code currently does not have -fconvert flag set at the top level, the executable will not convert fortran unformatted files to big-endian. I assume 'qr_acr_qgV2.dat' file is little-endian. Otherwise, code will fail while reading it.

We should use all fortran unformatted files using native endianness, which is little endian these days on all the platforms we support, and avoid setting the conversion flag globally.

I see CRTM provides both little and big endian version of this file:

https://github.com/JCSDA/crtm/blob/v2.4.0_emc.3/fix/SpcCoeff/Little_Endian/imgr_g15.SpcCoeff.bin

https://github.com/JCSDA/crtm/blob/v2.4.0_emc.3/fix/SpcCoeff/Big_Endian/imgr_g15.SpcCoeff.bin

Can you try to use little endian version?

@MichaelLueken
Copy link
Collaborator Author

Hello @DusanJovic-NOAA.

I was able to successfully run the inline post test using little endian CRTM coefficient files.

Unfortunately, there are two issues with using the little endian CRTM coefficient files:

  • While building spack-stack, only big endian coefficient files are available. I had to manually clone the JCSDA/crtm repo, clone the GSI repo, then use an old script from the GSI to link the little endian CRTM coefficients into a form that can be used in the weather model.
  • All WE2E tests that don't run inline post fail (which is most of the tests). The UPP repo uses the -fconvert flag, so the offline post tests all fail, since UPP can't read in the little endian coefficients.

Is NCO planning on moving to native endian files for Linux machines, or will they continue to use the old IBM big endian files? If they will change to little endian, then I can close this issue and wait until the convert flags are removed to move forward with this. If there are no plans to use little endian files, then the path forward isn't clear.

@DusanJovic-NOAA
Copy link
Collaborator

I don't know whether NCO has any plans regarding moving to native endian files or not. I checked the implementation standards document (here) and I do not see any mention of required endiannes of fortran unformatted files. If in fact there aren't any such requirements, then I guess each system can choose whatever format is most convenient. In that sense, we (the ufs community) should be able to choose to use whatever we want. The only issue we need to be aware of is that all unformatted files from all components must use the same endiannes, which is currently not the case, as you found, some files used by ccpp are little-endian, while some upp files are big-endian. This is important when GNU compiler is used. In addition to using unformatted files with same endiannes within the ufs-weather-model and all of its components it would be nice to be consistent with other programs for example standalone UPP, or other preprocessor /postprocessor programs, so that workflows do not need to have two sets of files, one for the model executable, one for other programs.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants