Frontier UVM and other preparations to reproduce 100T #967

wilfonba · 2025-07-25T14:49:25Z

User description

Description

Adds UVM and downsampling of I/O for Frontier. Also removes some additional memory allocations from pre_process that result in out-of-memory errors for large simulations. I validated that the cases with modified golden files pass by hacking the serial I/O output to remove the line count failures before regenerating the golden data. Still need to do some manual tests, but I'm marking this as ready for review now to accelerate the merge process.

Type of change

Please delete options that are not relevant.

[ x ] New feature (non-breaking change which adds functionality)

Scope

[ x ] This PR comprises a set of related changes with a common goal

How Has This Been Tested?

Unified Roc-Prof Profile

resultsUNIFIED.csv

No Unified Roc-Prof Profile

resultsNOUNIFIED.csv

Downsample verification. The left frame shows full resolution and the right frame shows downsample. The differences are there, but difficult to see.

out.mp4

PR Type

Enhancement, Tests

Description

• Frontier Unified Memory Support: Implemented unified memory allocation for Frontier HPC system with HIP memory pools and conditional compilation flags
• Data Downsampling Feature: Added comprehensive downsampling functionality across pre-process, simulation, and post-process modules with 3x3x3 averaging algorithm
• IGR (Infinite Gas Relaxation) Fixes: Fixed IGR alpha calculations and variable conversion for single fluid cases across multiple modules
• Memory Optimization: Removed additional memory allocations from pre-process to prevent out-of-memory errors in large simulations
• Optional Parameter Support: Made IB (immersed boundary) parameters optional in levelset computations and patch applications
• MPI Integration: Updated MPI parameter broadcasts and data initialization to support downsampling features
• Configuration Updates: Added CMake support for Frontier unified memory, updated case dictionaries, and modified Frontier template
• Test Updates: Regenerated golden metadata files for multiple test cases with updated timestamps and configuration

Diagram Walkthrough

flowchart LR
  A["Frontier UVM"] --> B["HIP Memory Pools"]
  A --> C["Unified Memory Allocation"]
  D["Downsampling"] --> E["3x3x3 Averaging"]
  D --> F["Upsampling with Trilinear Interpolation"]
  G["IGR Fixes"] --> H["Single Fluid Alpha Calculation"]
  I["Memory Optimization"] --> J["Reduced Pre-process Allocations"]
  K["Optional Parameters"] --> L["IB Levelset Functions"]
  B --> M["Time Steppers"]
  E --> N["Data I/O Modules"]
  F --> N

File Walkthrough

Relevant files

Enhancement

23 files

m_data_output.fpp `Add downsampling support to pre-process data output` src/pre_process/m_data_output.fpp • Added `m_helper` module import and `q_cons_temp` temporary variable for downsampling • Modified function signatures to make parameters optional and change intent from `in` to `inout` • Added conditional logic for IB (immersed boundary) data output • Integrated downsampling functionality with buffer population and data processing	+123/-54
m_time_steppers.fpp `Implement Frontier unified memory for time steppers` src/simulation/m_time_steppers.fpp • Added Frontier unified memory support with HIP memory allocation • Implemented device and host memory pools for conservative variables • Modified Runge-Kutta time stepping loops to support unified memory layout • Added conditional compilation blocks for `FRONTIER_UNIFIED` preprocessor flag	+164/-3
m_data_input.f90 `Add downsampling support to post-process data input` src/post_process/m_data_input.f90 • Added `q_cons_temp` temporary variable for downsampling support • Modified grid coordinate reading to support uniform spacing for downsampled data • Updated MPI data reading to handle downsampled file formats • Added conditional logic for downsampling in parallel data file reading	+110/-32
m_data_output.fpp `Add downsampling support to simulation data output` src/simulation/m_data_output.fpp • Added `m_boundary_common` module import and `q_cons_temp` variable • Modified function signatures to include `bc_type` parameter and change intent • Integrated downsampling functionality with conditional buffer allocation • Updated MPI data writing to support downsampled output formats	+112/-40
m_start_up.fpp `Integrate downsampling support in simulation startup` src/simulation/m_start_up.fpp • Added `down_sample` parameter to input reading and validation • Implemented downsampled data reading and upsampling functionality • Modified data file reading to support both regular and downsampled formats • Added conditional memory allocation for downsampling temporary variables	+94/-29
m_global_parameters.fpp `Add downsampling parameter and fix IGR indexing` src/pre_process/m_global_parameters.fpp • Added `down_sample` logical parameter with default value `false` • Fixed IGR (infinite gas relaxation) index configuration for single fluid cases • Modified MPI data allocation to skip when downsampling is enabled • Updated coordinate bounds configuration to include `igr_order` parameter	+18/-12
m_compute_levelset.fpp `Make levelset parameters optional and reorder signatures` src/pre_process/m_compute_levelset.fpp • Modified all levelset computation subroutines to make parameters optional • Changed parameter order to put `ib_patch_id` first in function signatures • Updated function interfaces for circle, airfoil, rectangle, cuboid, sphere, and cylinder levelsets	+21/-21
m_global_parameters.fpp `Add downsampling and domain parameters to post-process` src/post_process/m_global_parameters.fpp • Added `down_sample` logical parameter and domain boundary variables • Fixed IGR index configuration similar to pre-process module • Modified MPI data allocation to support downsampled array dimensions • Updated parameter initialization with domain boundary defaults	+23/-7
m_global_parameters.fpp `Add downsampling parameter to simulation globals` src/simulation/m_global_parameters.fpp • Added `down_sample` parameter with GPU declaration support • Fixed IGR index configuration for consistent behavior across modules • Modified MPI data allocation to skip when downsampling is enabled • Updated coordinate bounds configuration call with `igr_order` parameter	+14/-11
m_sim_helpers.fpp `Add data upsampling functionality and fix IGR alpha` src/simulation/m_sim_helpers.fpp • Fixed IGR alpha calculation to use constant value for single fluid case • Added `s_upsample_data` subroutine for converting downsampled data back to full resolution • Implemented trilinear interpolation algorithm for upsampling conservative variables	+41/-2
m_patches.fpp `Make IB parameters optional in patch application` src/pre_process/m_patches.fpp • Made IB-related parameters optional in `s_apply_domain_patches` subroutine • Updated levelset function calls to use new parameter ordering • Added conditional logic for IB marker and levelset processing	+11/-10
m_assign_variables.fpp `Add IGR conditional logic to variable assignment` src/pre_process/m_assign_variables.fpp • Added conditional logic to skip volume fraction assignment for IGR single fluid cases • Updated patch assignment and smoothing functions with IGR checks • Modified variable assignment to handle IGR special cases properly	+20/-14
m_initial_condition.fpp `Make IB allocations conditional in initial conditions` src/pre_process/m_initial_condition.fpp • Made IB-related allocations conditional based on `ib` flag • Updated patch application call to handle optional IB parameters • Added proper initialization and deallocation of IB-related variables	+16/-7
m_helper.fpp `Add data downsampling functionality` src/common/m_helper.fpp • Added `s_downsample_data` subroutine to public interface • Implemented 3x3x3 averaging algorithm for downsampling conservative variables • Added GPU data transfer support for downsampling operations	+47/-1
m_mpi_proxy.fpp `Add downsampling parameters to MPI broadcasts` src/post_process/m_mpi_proxy.fpp • Added `down_sample` to logical parameter broadcast list • Added domain boundary parameters to real parameter broadcast list • Updated MPI communication to include new downsampling parameters	+5/-2
m_boundary_common.fpp `Add intent specifications to boundary functions` src/common/m_boundary_common.fpp • Added `intent(in)` specifications to function parameters • Updated function signatures for better const-correctness • Modified boundary condition file writing functions with proper intent declarations	+7/-7
m_mpi_common.fpp `Add downsampled MPI data initialization` src/common/m_mpi_common.fpp • Added `s_initialize_mpi_data_ds` subroutine for downsampled MPI data initialization • Implemented downsampled array size calculations and MPI type creation • Added conditional compilation for post-process vs other modules	+51/-0
m_mpi_proxy.fpp `Add downsampling parameters to pre-process MPI` src/pre_process/m_mpi_proxy.fpp • Added `igr_order` to integer parameter broadcast list • Added `down_sample` to logical parameter broadcast list • Updated MPI parameter synchronization for new downsampling features	+2/-2
m_start_up.f90 `Add downsampling support to post-process startup` src/post_process/m_start_up.f90 • Added downsampling parameters to input file reading • Modified grid dimensions when downsampling is enabled • Updated global cell count calculation with proper integer types	+9/-2
m_helper_basic.fpp `Update coordinate bounds configuration for IGR` src/common/m_helper_basic.fpp • Updated `s_configure_coordinate_bounds` to include `igr_order` parameter • Modified buffer size calculation for IGR to use `igr_order` instead of hardcoded values • Improved coordinate bounds configuration for IGR cases	+3/-7
m_start_up.fpp `Add downsampling parameter to pre-process startup` src/pre_process/m_start_up.fpp • Added `down_sample` parameter to input file reading • Updated data file writing call to handle optional IB parameters • Added conditional logic for IB parameter passing	+7/-2
m_checker.fpp `Add validation checks for downsampling feature` src/pre_process/m_checker.fpp • Added comprehensive validation checks for downsampling feature • Enforced requirements: parallel I/O, IGR, 3D, file-per-process, and grid divisibility by 3 • Added error messages for invalid downsampling configurations	+8/-0
m_mpi_proxy.fpp `Add downsampling parameter to simulation MPI` src/simulation/m_mpi_proxy.fpp • Added `down_sample` to logical parameter broadcast list • Updated MPI parameter synchronization for simulation module	+1/-1

Bug fix

1 files

m_variables_conversion.fpp `Fix IGR variable conversion for single fluid cases` src/common/m_variables_conversion.fpp • Fixed IGR alpha calculations to use constant value for single fluid cases • Added conditional logic to skip advection variable processing for IGR single fluid • Updated primitive and conservative variable conversion routines	+14/-10

Configuration changes

4 files

case_dicts.py `Add downsampling parameters to case dictionaries` toolchain/mfc/run/case_dicts.py • Added `down_sample` parameter as logical type to common parameters • Added domain boundary parameters for x, y, z coordinates • Updated parameter dictionaries for pre-process configuration	+5/-0
frontier.mako `Add unified memory environment variable to Frontier template` toolchain/templates/frontier.mako • Added conditional export of `CRAY_ACC_USE_UNIFIED_MEM=1` environment variable • Enabled unified memory support when `unified` flag is set	+4/-0
CMakeLists.txt `Add Frontier unified memory CMake configuration` CMakeLists.txt • Added Frontier unified memory support for Cray compiler • Added `-DFRONTIER_UNIFIED` compile flag when `MFC_Unified` is enabled • Integrated HIP and hipfort library linking for unified memory	+6/-0
.typos.toml `Add HSA to typos whitelist` .typos.toml • Added `HSA` to the list of accepted words in typos configuration	+1/-0

Tests

13 files

golden-metadata.txt `Update test metadata for 106C0BE6` tests/106C0BE6/golden-metadata.txt • Updated test metadata with new timestamps and git commit information • Changed build configuration order and environment details	+22/-55
golden-metadata.txt `Update test metadata for CEAF553A` tests/CEAF553A/golden-metadata.txt • Updated test metadata with new timestamps and git commit information • Changed build configuration order and environment details	+22/-55
golden-metadata.txt `Update test metadata for single test execution` tests/6077374F/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Changed invocation command from batch test generation to single test • Reordered build configuration sections and updated environment variables • Modified hostname from network address to local machine name	+20/-53
golden-metadata.txt `Update test metadata for single test execution` tests/AE3FC5CB/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Changed invocation from batch to single test generation • Reordered build configuration sections and populated environment variables • Updated hostname and removed documentation section	+19/-52
golden-metadata.txt `Update test metadata for single test execution` tests/0045D9F8/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Changed invocation from batch to single test generation • Reordered build configuration sections and populated environment variables • Updated hostname from network address to local machine	+19/-52
golden-metadata.txt `Update test metadata for single test execution` tests/4440D46B/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Changed invocation from batch to single test generation • Reordered build configuration sections and populated environment variables • Updated hostname from network address to local machine	+19/-52
golden-metadata.txt `Update test metadata for single test execution` tests/C7A6B609/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Changed invocation from batch to single test generation • Reordered build configuration sections and populated environment variables • Updated hostname from network address to local machine	+19/-52
golden-metadata.txt `Update test metadata for single test execution` tests/E8979E4A/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Changed invocation from batch to single test generation • Reordered build configuration sections and populated environment variables • Updated hostname from network address to local machine	+16/-49
golden-metadata.txt `Update test metadata for single test execution` tests/AFBCBDFA/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Changed invocation from batch to single test generation • Reordered build configuration sections and populated environment variables • Updated hostname from network address to local machine	+16/-49
golden-metadata.txt `Update test metadata for single test execution` tests/37DDE283/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Changed invocation from batch to single test generation • Populated environment variables and removed documentation section • Updated hostname and reordered build configuration sections	+14/-47
golden-metadata.txt `Update test metadata for single test execution` tests/EB58AF7F/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Changed invocation from batch to single test generation • Populated environment variables and removed simulation section • Updated hostname and reordered build configuration sections	+14/-47
golden-metadata.txt `Update test metadata for single test execution` tests/A57E30FE/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Changed invocation from batch to single test generation • Populated environment variables with compiler paths • Updated hostname from network address to local machine	+10/-43
golden-metadata.txt `Update test metadata for single test execution` tests/E49EF7B6/golden-metadata.txt • Updated test metadata with new timestamp and git commit hash • Reordered build configuration sections • Removed documentation section from configuration • Updated hostname from network address to local machine	+12/-45

Formatting

1 files

modules `Clean up module configuration formatting` toolchain/modules • Minor formatting cleanup removing trailing whitespace • Updated module configuration for various HPC systems	+2/-2

Copilot

Pull Request Overview

This PR prepares for 100T simulations on Frontier by adding unified virtual memory (UVM) support and I/O downsampling capabilities. It removes memory allocation bottlenecks from pre_process to handle large-scale simulations and updates test golden files to reflect minor numerical differences.

Key Changes

Added UVM support for Frontier via environment variable configuration
Implemented I/O downsampling functionality with new parameter types
Updated test golden files to reflect numerical precision changes

Reviewed Changes

Copilot reviewed 44 out of 55 changed files in this pull request and generated no comments.

File	Description
toolchain/templates/frontier.mako	Adds conditional UVM environment variable support
toolchain/mfc/run/case_dicts.py	Adds downsampling parameter and domain boundary parameters
tests/EB58AF7F/golden.txt	Updates golden test data with minor numerical changes
tests/EB58AF7F/golden-metadata.txt	Updates test metadata with new generation timestamp

Comments suppressed due to low confidence (1)

toolchain/templates/frontier.mako:47

[nitpick] The variable name 'unified' is ambiguous in this context. Consider using a more descriptive name like 'enable_unified_memory' or 'use_uvm' to clearly indicate this controls unified virtual memory.

%if unified:

qodo-merge-pro · 2025-07-25T14:51:08Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Duplicate Code The downsampling logic is duplicated across multiple files (pre_process, simulation, post_process) with nearly identical implementations. This violates DRY principle and makes maintenance difficult. if ((mod(m + 1, 3) > 0) .or. (mod(n + 1, 3) > 0) .or. (mod(p + 1, 3) > 0)) then print , "WARNING: ATTEMPTING TO RUN DOWN SAMPLING WITH local problem size not divisible by 3" end if call s_populate_variables_buffers(bc_type, q_cons_vf) call s_downsample_data(q_cons_vf, q_cons_temp, & m_ds, n_ds, p_ds, m_glb_ds, n_glb_ds, p_glb_ds) end if Possible Issue* The FRONTIER_UNIFIED memory management uses complex pointer assignments and HIP memory allocation that could lead to memory corruption or access violations if not properly synchronized between host and device. #ifdef FRONTIER_UNIFIED if (proc_rank == 0 .and. num_ts /= 2) then call s_mpi_abort("Frontier unified memory assumes time_stepper = 2/3") end if ! Allocate to memory regions using hip calls ! that we will attach pointers to do i = 1, 3 pool_dims(i) = idwbuff(i)%end - idwbuff(i)%beg + 1 pool_starts(i) = idwbuff(i)%beg end do pool_dims(4) = sys_size pool_starts(4) = 1 ! Doing hipMalloc then mapping should be most performant call hipCheck(hipMalloc(q_cons_ts_pool_device, dims8=pool_dims, lbounds8=pool_starts)) ! Without this map CCE will still create a device copy, because it's silly like that call acc_map_data(q_cons_ts_pool_device, c_loc(q_cons_ts_pool_device), c_sizeof(q_cons_ts_pool_device)) ! CCE see it can access this and will leave it on the host. It will stay on the host so long as HSA_XNACK=1 ! NOTE: WE CANNOT DO ATOMICS INTO THIS MEMORY. We have to change a property to use atomics here ! Otherwise leaving this as fine-grained will actually help performance since it can't be cached in GPU L2 call hipCheck(hipMallocManaged(q_cons_ts_pool_host, dims8=pool_dims, lbounds8=pool_starts, flags=hipMemAttachGlobal)) do j = 1, sys_size ! q_cons_ts(1) lives on the device q_cons_ts(1)%vf(j)%sf(idwbuff(1)%beg:idwbuff(1)%end, & idwbuff(2)%beg:idwbuff(2)%end, & idwbuff(3)%beg:idwbuff(3)%end) => q_cons_ts_pool_device(:, :, :, j) ! q_cons_ts(2) lives on the host q_cons_ts(2)%vf(j)%sf(idwbuff(1)%beg:idwbuff(1)%end, & idwbuff(2)%beg:idwbuff(2)%end, & idwbuff(3)%beg:idwbuff(3)%end) => q_cons_ts_pool_host(:, :, :, j) end do do i = 1, num_ts @:ACC_SETUP_VFs(q_cons_ts(i)) do j = 1, sys_size $:GPU_UPDATE(device='[q_cons_ts(i)%vf(j)]') end do end do #else Logic Error The IGR (Infinite Gas Relaxation) logic changes alpha_K assignment for single fluid case from q_vf(advxb) to hardcoded 1.0, which may break existing functionality and needs validation. if (igr) then if (num_fluids == 1) then alpha_rho_K(1) = q_vf(contxb)%sf(k, l, r) alpha_K(1) = 1._wp else

qodo-merge-pro · 2025-07-25T14:53:10Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	✅ ~~Clarify error message for time stepper requirement~~ Suggestion Impact: The suggestion was implemented by removing the confusing error message check entirely and restructuring the code to conditionally handle the num_ts == 2 case instead of aborting code diff: #ifdef FRONTIER_UNIFIED - if (proc_rank == 0 .and. num_ts /= 2) then - call s_mpi_abort("Frontier unified memory assumes time_stepper = 2/3") - end if - ! Allocate to memory regions using hip calls ! that we will attach pointers to do i = 1, 3 @@ -133,17 +128,21 @@ ! CCE see it can access this and will leave it on the host. It will stay on the host so long as HSA_XNACK=1 ! NOTE: WE CANNOT DO ATOMICS INTO THIS MEMORY. We have to change a property to use atomics here ! Otherwise leaving this as fine-grained will actually help performance since it can't be cached in GPU L2 - call hipCheck(hipMallocManaged(q_cons_ts_pool_host, dims8=pool_dims, lbounds8=pool_starts, flags=hipMemAttachGlobal)) + if (num_ts == 2) then + call hipCheck(hipMallocManaged(q_cons_ts_pool_host, dims8=pool_dims, lbounds8=pool_starts, flags=hipMemAttachGlobal)) + end if The error message is confusing as it mentions "time_stepper = 2/3" but the condition checks for `num_ts /= 2`. The message should clearly state the expected value and be more descriptive about the requirement. src/simulation/m_time_steppers.fpp [115-117] if (proc_rank == 0 .and. num_ts /= 2) then - call s_mpi_abort("Frontier unified memory assumes time_stepper = 2/3") + call s_mpi_abort("Frontier unified memory requires num_ts = 2 (RK2 time stepping)") end if Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a confusing error message and proposes a clearer version, which improves code maintainability and user experience.	Medium
General	Use distinct variables for z-direction interpolation The upsampling algorithm reuses `temp(1)` and `temp(2)` variables which can lead to confusion and potential bugs. Use distinct variable names for the z-direction interpolation to improve code clarity and maintainability. src/simulation/m_sim_helpers.fpp [318-324] temp(1) = (2._wp/3._wp)q_cons_temp(i)%sf(ix, iy, iz) + (1._wp/3._wp)q_cons_temp(i)%sf(ix + x_id, iy, iz) temp(2) = (2._wp/3._wp)q_cons_temp(i)%sf(ix, iy + y_id, iz) + (1._wp/3._wp)q_cons_temp(i)%sf(ix + x_id, iy + y_id, iz) temp(3) = (2._wp/3._wp)temp(1) + (1._wp/3._wp)temp(2) -temp(1) = (2._wp/3._wp)q_cons_temp(i)%sf(ix, iy, iz + z_id) + (1._wp/3._wp)q_cons_temp(i)%sf(ix + x_id, iy, iz + z_id) -temp(2) = (2._wp/3._wp)q_cons_temp(i)%sf(ix, iy + y_id, iz + z_id) + (1._wp/3._wp)q_cons_temp(i)%sf(ix + x_id, iy + y_id, iz + z_id) -temp(4) = (2._wp/3._wp)temp(1) + (1._wp/3._wp)temp(2) +temp_z1 = (2._wp/3._wp)q_cons_temp(i)%sf(ix, iy, iz + z_id) + (1._wp/3._wp)q_cons_temp(i)%sf(ix + x_id, iy, iz + z_id) +temp_z2 = (2._wp/3._wp)q_cons_temp(i)%sf(ix, iy + y_id, iz + z_id) + (1._wp/3._wp)q_cons_temp(i)%sf(ix + x_id, iy + y_id, iz + z_id) +temp(4) = (2._wp/3._wp)temp_z1 + (1._wp/3._wp)temp_z2 Suggestion importance[1-10]: 5 __ Why: The suggestion improves code readability by using distinct temporary variables for the z-direction interpolation, which makes the algorithm easier to follow.	Low
Update

codecov · 2025-07-25T16:53:49Z

Codecov Report

❌ Patch coverage is 27.20207% with 281 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.33%. Comparing base (52c0aac) to head (bf56824).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_time_steppers.fpp	27.63%	52 Missing and 3 partials ⚠️
src/common/m_helper.fpp	0.00%	53 Missing ⚠️
src/pre_process/m_data_output.fpp	22.91%	34 Missing and 3 partials ⚠️
src/simulation/m_data_output.fpp	18.18%	29 Missing and 7 partials ⚠️
src/simulation/m_start_up.fpp	16.66%	28 Missing and 2 partials ⚠️
src/post_process/m_data_input.f90	33.33%	23 Missing and 3 partials ⚠️
src/common/m_mpi_common.fpp	0.00%	20 Missing ⚠️
src/pre_process/m_initial_condition.fpp	30.00%	5 Missing and 2 partials ⚠️
src/post_process/m_start_up.f90	20.00%	3 Missing and 1 partial ⚠️
src/common/m_variables_conversion.fpp	50.00%	2 Missing and 1 partial ⚠️
... and 8 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #967      +/-   ##
==========================================
- Coverage   43.67%   43.33%   -0.34%     
==========================================
  Files          70       70              
  Lines       19818    20046     +228     
  Branches     2473     2510      +37     
==========================================
+ Hits         8655     8687      +32     
- Misses       9644     9824     +180     
- Partials     1519     1535      +16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sbryngelson · 2025-07-29T02:19:06Z

let me know how you want to deal with this PR

wilfonba · 2025-07-30T12:22:00Z

@sbryngelson I need to do two quick visualizations and then some profiles with UVM on Frontier. Can you trigger the benchmarking to see if anything else slowed down unexpectedly?

sbryngelson · 2025-07-30T12:24:30Z

yes

sbryngelson · 2025-07-30T23:22:26Z

According to benchmarks IGR is 17% slower so it failed CPU benchmark. IGR is faster on GPU cases for both nvidia and amd gpu, not sure what to make of that... perhaps you are including some other things that make it faster...

wilfonba · 2025-07-31T16:55:40Z

According to benchmarks IGR is 17% slower so it failed CPU benchmark. IGR is faster on GPU cases for both nvidia and amd gpu, not sure what to make of that... perhaps you are including some other things that make it faster...

Yeah, I noticed the CPU slowdown. I don't remember seeing that GPU was faster, but maybe I just missed it. Still working on getting the expected unified performance on Frontier

wilfonba · 2025-08-02T03:47:44Z

@sbryngelson This should be ready to merge once the tests/benchmarks pass

wilfonba · 2025-08-02T22:43:32Z

CPU benchmarking is still showing a slowdown while GPU benchmarks show a slight speedup. My vote is to ignore that slowdown because GPU is what really matters.

src/pre_process/m_data_output.fpp

src/pre_process/m_global_parameters.fpp

src/simulation/m_global_parameters.fpp

src/simulation/m_data_output.fpp

sbryngelson

minor

wilfonba · 2025-08-03T00:48:18Z

Comments addressed. I'll need to verify that down-sampling still functions as expected, given the kind of changes I had to make.

wilfonba · 2025-08-03T03:14:24Z

I can confirm that down sampling is working as expected with the changes

src/pre_process/m_global_parameters.fpp

sbryngelson · 2025-08-03T04:16:10Z

i'm very confused why this is slower on cpu.... and meaningfully so.

wilfonba · 2025-08-03T04:19:08Z

i'm very confused why this is slower on cpu.... and meaningfully so.

I have no clue. None of the changes I made should have slowed anything down. I observe a similar CPU slowdown on my Mac with GNU 14.2.0

wilfonba · 2025-08-03T05:24:34Z

It didn't slow down. The variable sys_size went from 6 on the master branch to 5 on this branch for the single fluid case. Since the grind time is divided by the number of equations, the master branch thought it was solving one more equation in the same amount of time. The weird thing is the benchmark output doesn't seem to make sense. Here's what I'm seeing

Device	Master Grind time	PR grind time	Mast/PR	Printed Speedup
CPU	38.49	45.37	0.85	0.85
GPU	0.758	0.716	1.05	1.05

I've included the relevant printed logs output from the CI. What's still weird is that CPU reports a slowdown and GPU reports a speedup. In the GPU printed logs, the results for the master branch are printed first, followed by the results for the PR. In the CPU printed logs, the order is reversed. Basically, the code isn't slower, but there's something up with the benchmarking diffs.

CPU Master Log

igr:
description:
args:
- -c
- phoenix-bench
- -n
- '12'
path: /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/master/benchmarks/igr/case.py
slug: igr
output_summary:
invocation:
- run
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/master/benchmarks/igr/case.py
- --case-optimization
- --targets
- pre_process
- simulation
- post_process
- --output-summary
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/master/build/benchmarks/cc77/igr.yaml
- -c
- phoenix-bench
- -n
- '12'
- --
- --gbpp
- '1'
lock:
debug: false
gcov: false
gpu: false
mpi: true
single: false
unified: false
post_process:
exec: 5.0192976
pre_process:
exec: 1.8623343
simulation:
exec: 433.1306084
grind: 38.49284332
syscheck:
exec: 0.7191267

CPU PR Log

igr:
description:
args:
- -c
- phoenix-bench
- -n
- '12'
path: /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/pr/benchmarks/igr/case.py
slug: igr
output_summary:
invocation:
- run
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/pr/benchmarks/igr/case.py
- --case-optimization
- --targets
- pre_process
- simulation
- post_process
- --output-summary
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/pr/build/benchmarks/c408/igr.yaml
- -c
- phoenix-bench
- -n
- '12'
- --
- --gbpp
- '1'
lock:
debug: false
gcov: false
gpu: false
mpi: true
single: false
unified: false
post_process:
exec: 4.6853891
pre_process:
exec: 1.7488573
simulation:
exec: 424.3913942
grind: 45.37525157
syscheck:
exec: 0.7126154

GPU Master Log

igr:
description:
args:
- -c
- phoenix-bench
- --gpu
- -g
- '0'
- '1'
- -n
- '2'
path: /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/master/benchmarks/igr/case.py
slug: igr
output_summary:
invocation:
- run
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/master/benchmarks/igr/case.py
- --case-optimization
- --targets
- pre_process
- simulation
- post_process
- --output-summary
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/master/build/benchmarks/2b3e/igr.yaml
- -c
- phoenix-bench
- --gpu
- -g
- '0'
- '1'
- -n
- '2'
- --
- --gbpp
- '12'
lock:
debug: false
gcov: false
gpu: true
mpi: true
single: false
unified: false
post_process:
exec: 7.6285611
pre_process:
exec: 5.4373726
simulation:
exec: 40.6332197
grind: 0.75806516
syscheck:
exec: 1.3913716

GPU PR Log

igr:
description:
args:
- -c
- phoenix-bench
- --gpu
- -g
- '0'
- '1'
- -n
- '2'
path: /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/pr/benchmarks/igr/case.py
slug: igr
output_summary:
invocation:
- run
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/pr/benchmarks/igr/case.py
- --case-optimization
- --targets
- pre_process
- simulation
- post_process
- --output-summary
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/pr/build/benchmarks/393f/igr.yaml
- -c
- phoenix-bench
- --gpu
- -g
- '0'
- '1'
- -n
- '2'
- --
- --gbpp
- '12'
lock:
debug: false
gcov: false
gpu: true
mpi: true
single: false
unified: false
post_process:
exec: 6.8433884
pre_process:
exec: 5.1596627
simulation:
exec: 33.2053005
grind: 0.71615578
syscheck:
exec: 1.5515504

wilfonba and others added 8 commits July 23, 2025 22:28

UVM and down sample

42a3875

pre_process mem reduce

63c5fcf

MIN MEM FIXES

a47cb03

POST PROCESS FIX

84054d4

bug fixes

63ce00e

format and fix test suite

bc21e60

IGR Example golden file fix

1ba43dd

spelling fix

c110e9e

Copilot AI review requested due to automatic review settings July 25, 2025 14:49

wilfonba requested review from a team and sbryngelson as code owners July 25, 2025 14:49

Copilot AI reviewed Jul 25, 2025

View reviewed changes

qodo-merge-pro bot added the Review effort 4/5 label Jul 25, 2025

test suite works with --unified and --no-unified

6b834c7

sbryngelson approved these changes Jul 26, 2025

View reviewed changes

Merge branch 'master' into FrontierUVM

55e7361

sbryngelson approved these changes Jul 30, 2025

View reviewed changes

wilfonba and others added 4 commits August 1, 2025 00:18

improve real time timing output

a55eecc

fix post process

303ded6

Merge remote-tracking branch 'upstream/master' into FrontierUVM

c02d32b

Merge remote-tracking branch 'upstream/master' into FrontierUVM

f25a5f2

format

5b40ed1

sbryngelson approved these changes Aug 2, 2025

View reviewed changes

improve unified performance on Frontier

75ed756

Merge remote-tracking branch 'upstream/master' into FrontierUVM

752e5fc