Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Clean up test lists to reduce total cost #179

Open
3 tasks
mnlevy1981 opened this issue Aug 15, 2024 · 0 comments
Open
3 tasks

Clean up test lists to reduce total cost #179

mnlevy1981 opened this issue Aug 15, 2024 · 0 comments

Comments

@mnlevy1981
Copy link
Collaborator

mnlevy1981 commented Aug 15, 2024

I was just looking at the model cost of running some CESM tests with MARBL turned on, and it's not good: an SMS.TL319_t232.G1850MARBL_JRA.derecho_intel test costs 1000 core-hours (a comparable test without MARBL is in the neighborhood of 80 cpu-hours). I think the bulk of the additional cost comes from having every MARBL diagnostic in the diag_table -- by default MARBL asks MOM6 writes "minimal" output (49 fields) in fully coupled runs, "full" output (238 fields) in ocean-only and FOSI runs, and every diagnostic (353 fields) if the test suite is on.

I think we want to do the following:

  • the aux_mom_MARBL test list should continue to write the full output, but tests should be shortened (SMS_Ld2 instead of SMS should cost ~400 cpu-hours, which isn't quite as bad)
  • the prealpha, prebeta, and aux_mom test lists should only either (a) only write the default output based on the compset, or (b) write minimal output for both fully coupled and full output runs
  • @alperaltuntas if we turn on FMS's parallel I/O for the test suite, will cprnc still be able to compare new tests to a baseline? I don't want to lose bit-for-bit checks, but if the archiver and test system don't care that each time slice is broken across multiple files that would help reduce cost as well

edit: When I initially posted this, I was looking at cpu-hrs / year rather than total cpu-hrs. I've adjusted the numbers in the opening paragraph, but the point still stands -- we think of MARBL as increasing cost somewhere between 3x and 5x, but the tests are 12x more expensive and I think a lot of the gap between "3-5x" and "12x" is due to I/O

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant