Skip to content

Frontier UVM and other preparations to reproduce 100T #967

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 20 commits into from
Aug 3, 2025

Conversation

wilfonba
Copy link
Collaborator

@wilfonba wilfonba commented Jul 25, 2025

User description

Description

Adds UVM and downsampling of I/O for Frontier. Also removes some additional memory allocations from pre_process that result in out-of-memory errors for large simulations. I validated that the cases with modified golden files pass by hacking the serial I/O output to remove the line count failures before regenerating the golden data. Still need to do some manual tests, but I'm marking this as ready for review now to accelerate the merge process.

Type of change

Please delete options that are not relevant.

  • [ x ] New feature (non-breaking change which adds functionality)

Scope

  • [ x ] This PR comprises a set of related changes with a common goal

How Has This Been Tested?

Unified Roc-Prof Profile

resultsUNIFIED.csv

No Unified Roc-Prof Profile

resultsNOUNIFIED.csv

Downsample verification. The left frame shows full resolution and the right frame shows downsample. The differences are there, but difficult to see.

out.mp4

PR Type

Enhancement, Tests


Description

Frontier Unified Memory Support: Implemented unified memory allocation for Frontier HPC system with HIP memory pools and conditional compilation flags
Data Downsampling Feature: Added comprehensive downsampling functionality across pre-process, simulation, and post-process modules with 3x3x3 averaging algorithm
IGR (Infinite Gas Relaxation) Fixes: Fixed IGR alpha calculations and variable conversion for single fluid cases across multiple modules
Memory Optimization: Removed additional memory allocations from pre-process to prevent out-of-memory errors in large simulations
Optional Parameter Support: Made IB (immersed boundary) parameters optional in levelset computations and patch applications
MPI Integration: Updated MPI parameter broadcasts and data initialization to support downsampling features
Configuration Updates: Added CMake support for Frontier unified memory, updated case dictionaries, and modified Frontier template
Test Updates: Regenerated golden metadata files for multiple test cases with updated timestamps and configuration


Diagram Walkthrough

flowchart LR
  A["Frontier UVM"] --> B["HIP Memory Pools"]
  A --> C["Unified Memory Allocation"]
  D["Downsampling"] --> E["3x3x3 Averaging"]
  D --> F["Upsampling with Trilinear Interpolation"]
  G["IGR Fixes"] --> H["Single Fluid Alpha Calculation"]
  I["Memory Optimization"] --> J["Reduced Pre-process Allocations"]
  K["Optional Parameters"] --> L["IB Levelset Functions"]
  B --> M["Time Steppers"]
  E --> N["Data I/O Modules"]
  F --> N
Loading

File Walkthrough

Relevant files
Enhancement
23 files
m_data_output.fpp
Add downsampling support to pre-process data output           

src/pre_process/m_data_output.fpp

• Added m_helper module import and q_cons_temp temporary variable for
downsampling
• Modified function signatures to make parameters
optional and change intent from in to inout
• Added conditional logic
for IB (immersed boundary) data output
• Integrated downsampling
functionality with buffer population and data processing

+123/-54
m_time_steppers.fpp
Implement Frontier unified memory for time steppers           

src/simulation/m_time_steppers.fpp

• Added Frontier unified memory support with HIP memory allocation

Implemented device and host memory pools for conservative variables

Modified Runge-Kutta time stepping loops to support unified memory
layout
• Added conditional compilation blocks for FRONTIER_UNIFIED
preprocessor flag

+164/-3 
m_data_input.f90
Add downsampling support to post-process data input           

src/post_process/m_data_input.f90

• Added q_cons_temp temporary variable for downsampling support

Modified grid coordinate reading to support uniform spacing for
downsampled data
• Updated MPI data reading to handle downsampled file
formats
• Added conditional logic for downsampling in parallel data
file reading

+110/-32
m_data_output.fpp
Add downsampling support to simulation data output             

src/simulation/m_data_output.fpp

• Added m_boundary_common module import and q_cons_temp variable

Modified function signatures to include bc_type parameter and change
intent
• Integrated downsampling functionality with conditional buffer
allocation
• Updated MPI data writing to support downsampled output
formats

+112/-40
m_start_up.fpp
Integrate downsampling support in simulation startup         

src/simulation/m_start_up.fpp

• Added down_sample parameter to input reading and validation

Implemented downsampled data reading and upsampling functionality

Modified data file reading to support both regular and downsampled
formats
• Added conditional memory allocation for downsampling
temporary variables

+94/-29 
m_global_parameters.fpp
Add downsampling parameter and fix IGR indexing                   

src/pre_process/m_global_parameters.fpp

• Added down_sample logical parameter with default value false
• Fixed
IGR (infinite gas relaxation) index configuration for single fluid
cases
• Modified MPI data allocation to skip when downsampling is
enabled
• Updated coordinate bounds configuration to include igr_order
parameter

+18/-12 
m_compute_levelset.fpp
Make levelset parameters optional and reorder signatures 

src/pre_process/m_compute_levelset.fpp

• Modified all levelset computation subroutines to make parameters
optional
• Changed parameter order to put ib_patch_id first in
function signatures
• Updated function interfaces for circle, airfoil,
rectangle, cuboid, sphere, and cylinder levelsets

+21/-21 
m_global_parameters.fpp
Add downsampling and domain parameters to post-process     

src/post_process/m_global_parameters.fpp

• Added down_sample logical parameter and domain boundary variables

Fixed IGR index configuration similar to pre-process module
• Modified
MPI data allocation to support downsampled array dimensions
• Updated
parameter initialization with domain boundary defaults

+23/-7   
m_global_parameters.fpp
Add downsampling parameter to simulation globals                 

src/simulation/m_global_parameters.fpp

• Added down_sample parameter with GPU declaration support
• Fixed IGR
index configuration for consistent behavior across modules
• Modified
MPI data allocation to skip when downsampling is enabled
• Updated
coordinate bounds configuration call with igr_order parameter

+14/-11 
m_sim_helpers.fpp
Add data upsampling functionality and fix IGR alpha           

src/simulation/m_sim_helpers.fpp

• Fixed IGR alpha calculation to use constant value for single fluid
case
• Added s_upsample_data subroutine for converting downsampled
data back to full resolution
• Implemented trilinear interpolation
algorithm for upsampling conservative variables

+41/-2   
m_patches.fpp
Make IB parameters optional in patch application                 

src/pre_process/m_patches.fpp

• Made IB-related parameters optional in s_apply_domain_patches
subroutine
• Updated levelset function calls to use new parameter
ordering
• Added conditional logic for IB marker and levelset
processing

+11/-10 
m_assign_variables.fpp
Add IGR conditional logic to variable assignment                 

src/pre_process/m_assign_variables.fpp

• Added conditional logic to skip volume fraction assignment for IGR
single fluid cases
• Updated patch assignment and smoothing functions
with IGR checks
• Modified variable assignment to handle IGR special
cases properly

+20/-14 
m_initial_condition.fpp
Make IB allocations conditional in initial conditions       

src/pre_process/m_initial_condition.fpp

• Made IB-related allocations conditional based on ib flag
• Updated
patch application call to handle optional IB parameters
• Added proper
initialization and deallocation of IB-related variables

+16/-7   
m_helper.fpp
Add data downsampling functionality                                           

src/common/m_helper.fpp

• Added s_downsample_data subroutine to public interface
• Implemented
3x3x3 averaging algorithm for downsampling conservative variables

Added GPU data transfer support for downsampling operations

+47/-1   
m_mpi_proxy.fpp
Add downsampling parameters to MPI broadcasts                       

src/post_process/m_mpi_proxy.fpp

• Added down_sample to logical parameter broadcast list
• Added domain
boundary parameters to real parameter broadcast list
• Updated MPI
communication to include new downsampling parameters

+5/-2     
m_boundary_common.fpp
Add intent specifications to boundary functions                   

src/common/m_boundary_common.fpp

• Added intent(in) specifications to function parameters
• Updated
function signatures for better const-correctness
• Modified boundary
condition file writing functions with proper intent declarations

+7/-7     
m_mpi_common.fpp
Add downsampled MPI data initialization                                   

src/common/m_mpi_common.fpp

• Added s_initialize_mpi_data_ds subroutine for downsampled MPI data
initialization
• Implemented downsampled array size calculations and
MPI type creation
• Added conditional compilation for post-process vs
other modules

+51/-0   
m_mpi_proxy.fpp
Add downsampling parameters to pre-process MPI                     

src/pre_process/m_mpi_proxy.fpp

• Added igr_order to integer parameter broadcast list
• Added
down_sample to logical parameter broadcast list
• Updated MPI
parameter synchronization for new downsampling features

+2/-2     
m_start_up.f90
Add downsampling support to post-process startup                 

src/post_process/m_start_up.f90

• Added downsampling parameters to input file reading
• Modified grid
dimensions when downsampling is enabled
• Updated global cell count
calculation with proper integer types

+9/-2     
m_helper_basic.fpp
Update coordinate bounds configuration for IGR                     

src/common/m_helper_basic.fpp

• Updated s_configure_coordinate_bounds to include igr_order parameter

• Modified buffer size calculation for IGR to use igr_order instead of
hardcoded values
• Improved coordinate bounds configuration for IGR
cases

+3/-7     
m_start_up.fpp
Add downsampling parameter to pre-process startup               

src/pre_process/m_start_up.fpp

• Added down_sample parameter to input file reading
• Updated data
file writing call to handle optional IB parameters
• Added conditional
logic for IB parameter passing

+7/-2     
m_checker.fpp
Add validation checks for downsampling feature                     

src/pre_process/m_checker.fpp

• Added comprehensive validation checks for downsampling feature

Enforced requirements: parallel I/O, IGR, 3D, file-per-process, and
grid divisibility by 3
• Added error messages for invalid downsampling
configurations

+8/-0     
m_mpi_proxy.fpp
Add downsampling parameter to simulation MPI                         

src/simulation/m_mpi_proxy.fpp

• Added down_sample to logical parameter broadcast list
• Updated MPI
parameter synchronization for simulation module

+1/-1     
Bug fix
1 files
m_variables_conversion.fpp
Fix IGR variable conversion for single fluid cases             

src/common/m_variables_conversion.fpp

• Fixed IGR alpha calculations to use constant value for single fluid
cases
• Added conditional logic to skip advection variable processing
for IGR single fluid
• Updated primitive and conservative variable
conversion routines

+14/-10 
Configuration changes
4 files
case_dicts.py
Add downsampling parameters to case dictionaries                 

toolchain/mfc/run/case_dicts.py

• Added down_sample parameter as logical type to common parameters

Added domain boundary parameters for x, y, z coordinates
• Updated
parameter dictionaries for pre-process configuration

+5/-0     
frontier.mako
Add unified memory environment variable to Frontier template

toolchain/templates/frontier.mako

• Added conditional export of CRAY_ACC_USE_UNIFIED_MEM=1 environment
variable
• Enabled unified memory support when unified flag is set

+4/-0     
CMakeLists.txt
Add Frontier unified memory CMake configuration                   

CMakeLists.txt

• Added Frontier unified memory support for Cray compiler
• Added
-DFRONTIER_UNIFIED compile flag when MFC_Unified is enabled

Integrated HIP and hipfort library linking for unified memory

+6/-0     
.typos.toml
Add HSA to typos whitelist                                                             

.typos.toml

• Added HSA to the list of accepted words in typos configuration

+1/-0     
Tests
13 files
golden-metadata.txt
Update test metadata for 106C0BE6                                               

tests/106C0BE6/golden-metadata.txt

• Updated test metadata with new timestamps and git commit information

• Changed build configuration order and environment details

+22/-55 
golden-metadata.txt
Update test metadata for CEAF553A                                               

tests/CEAF553A/golden-metadata.txt

• Updated test metadata with new timestamps and git commit information

• Changed build configuration order and environment details

+22/-55 
golden-metadata.txt
Update test metadata for single test execution                     

tests/6077374F/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Changed invocation command from batch test generation to single test

Reordered build configuration sections and updated environment
variables
• Modified hostname from network address to local machine
name

+20/-53 
golden-metadata.txt
Update test metadata for single test execution                     

tests/AE3FC5CB/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Changed invocation from batch to single test generation
• Reordered
build configuration sections and populated environment variables

Updated hostname and removed documentation section

+19/-52 
golden-metadata.txt
Update test metadata for single test execution                     

tests/0045D9F8/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Changed invocation from batch to single test generation
• Reordered
build configuration sections and populated environment variables

Updated hostname from network address to local machine

+19/-52 
golden-metadata.txt
Update test metadata for single test execution                     

tests/4440D46B/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Changed invocation from batch to single test generation
• Reordered
build configuration sections and populated environment variables

Updated hostname from network address to local machine

+19/-52 
golden-metadata.txt
Update test metadata for single test execution                     

tests/C7A6B609/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Changed invocation from batch to single test generation
• Reordered
build configuration sections and populated environment variables

Updated hostname from network address to local machine

+19/-52 
golden-metadata.txt
Update test metadata for single test execution                     

tests/E8979E4A/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Changed invocation from batch to single test generation
• Reordered
build configuration sections and populated environment variables

Updated hostname from network address to local machine

+16/-49 
golden-metadata.txt
Update test metadata for single test execution                     

tests/AFBCBDFA/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Changed invocation from batch to single test generation
• Reordered
build configuration sections and populated environment variables

Updated hostname from network address to local machine

+16/-49 
golden-metadata.txt
Update test metadata for single test execution                     

tests/37DDE283/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Changed invocation from batch to single test generation
• Populated
environment variables and removed documentation section
• Updated
hostname and reordered build configuration sections

+14/-47 
golden-metadata.txt
Update test metadata for single test execution                     

tests/EB58AF7F/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Changed invocation from batch to single test generation
• Populated
environment variables and removed simulation section
• Updated
hostname and reordered build configuration sections

+14/-47 
golden-metadata.txt
Update test metadata for single test execution                     

tests/A57E30FE/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Changed invocation from batch to single test generation
• Populated
environment variables with compiler paths
• Updated hostname from
network address to local machine

+10/-43 
golden-metadata.txt
Update test metadata for single test execution                     

tests/E49EF7B6/golden-metadata.txt

• Updated test metadata with new timestamp and git commit hash

Reordered build configuration sections
• Removed documentation section
from configuration
• Updated hostname from network address to local
machine

+12/-45 
Formatting
1 files
modules
Clean up module configuration formatting                                 

toolchain/modules

• Minor formatting cleanup removing trailing whitespace
• Updated
module configuration for various HPC systems

+2/-2     

@Copilot Copilot AI review requested due to automatic review settings July 25, 2025 14:49
@wilfonba wilfonba requested review from a team and sbryngelson as code owners July 25, 2025 14:49
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR prepares for 100T simulations on Frontier by adding unified virtual memory (UVM) support and I/O downsampling capabilities. It removes memory allocation bottlenecks from pre_process to handle large-scale simulations and updates test golden files to reflect minor numerical differences.

Key Changes

  • Added UVM support for Frontier via environment variable configuration
  • Implemented I/O downsampling functionality with new parameter types
  • Updated test golden files to reflect numerical precision changes

Reviewed Changes

Copilot reviewed 44 out of 55 changed files in this pull request and generated no comments.

File Description
toolchain/templates/frontier.mako Adds conditional UVM environment variable support
toolchain/mfc/run/case_dicts.py Adds downsampling parameter and domain boundary parameters
tests/EB58AF7F/golden.txt Updates golden test data with minor numerical changes
tests/EB58AF7F/golden-metadata.txt Updates test metadata with new generation timestamp
Comments suppressed due to low confidence (1)

toolchain/templates/frontier.mako:47

  • [nitpick] The variable name 'unified' is ambiguous in this context. Consider using a more descriptive name like 'enable_unified_memory' or 'use_uvm' to clearly indicate this controls unified virtual memory.
%if unified:

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Duplicate Code

The downsampling logic is duplicated across multiple files (pre_process, simulation, post_process) with nearly identical implementations. This violates DRY principle and makes maintenance difficult.

    if ((mod(m + 1, 3) > 0) .or. (mod(n + 1, 3) > 0) .or. (mod(p + 1, 3) > 0)) then
        print *, "WARNING: ATTEMPTING TO RUN DOWN SAMPLING WITH local problem size not divisible by 3"
    end if
    call s_populate_variables_buffers(bc_type, q_cons_vf)
    call s_downsample_data(q_cons_vf, q_cons_temp, &
                           m_ds, n_ds, p_ds, m_glb_ds, n_glb_ds, p_glb_ds)
end if
Possible Issue

The FRONTIER_UNIFIED memory management uses complex pointer assignments and HIP memory allocation that could lead to memory corruption or access violations if not properly synchronized between host and device.

#ifdef FRONTIER_UNIFIED
        if (proc_rank == 0 .and. num_ts /= 2) then
            call s_mpi_abort("Frontier unified memory assumes time_stepper = 2/3")
        end if

        ! Allocate to memory regions using hip calls
        ! that we will attach pointers to
        do i = 1, 3
            pool_dims(i) = idwbuff(i)%end - idwbuff(i)%beg + 1
            pool_starts(i) = idwbuff(i)%beg
        end do
        pool_dims(4) = sys_size
        pool_starts(4) = 1

        ! Doing hipMalloc then mapping should be most performant
        call hipCheck(hipMalloc(q_cons_ts_pool_device, dims8=pool_dims, lbounds8=pool_starts))
        ! Without this map CCE will still create a device copy, because it's silly like that
        call acc_map_data(q_cons_ts_pool_device, c_loc(q_cons_ts_pool_device), c_sizeof(q_cons_ts_pool_device))

        ! CCE see it can access this and will leave it on the host. It will stay on the host so long as HSA_XNACK=1
        ! NOTE: WE CANNOT DO ATOMICS INTO THIS MEMORY. We have to change a property to use atomics here
        ! Otherwise leaving this as fine-grained will actually help performance since it can't be cached in GPU L2
        call hipCheck(hipMallocManaged(q_cons_ts_pool_host, dims8=pool_dims, lbounds8=pool_starts, flags=hipMemAttachGlobal))

        do j = 1, sys_size
            ! q_cons_ts(1) lives on the device
            q_cons_ts(1)%vf(j)%sf(idwbuff(1)%beg:idwbuff(1)%end, &
                                  idwbuff(2)%beg:idwbuff(2)%end, &
                                  idwbuff(3)%beg:idwbuff(3)%end) => q_cons_ts_pool_device(:, :, :, j)
            ! q_cons_ts(2) lives on the host
            q_cons_ts(2)%vf(j)%sf(idwbuff(1)%beg:idwbuff(1)%end, &
                                  idwbuff(2)%beg:idwbuff(2)%end, &
                                  idwbuff(3)%beg:idwbuff(3)%end) => q_cons_ts_pool_host(:, :, :, j)
        end do

        do i = 1, num_ts
            @:ACC_SETUP_VFs(q_cons_ts(i))
            do j = 1, sys_size
                $:GPU_UPDATE(device='[q_cons_ts(i)%vf(j)]')
            end do
        end do
#else
Logic Error

The IGR (Infinite Gas Relaxation) logic changes alpha_K assignment for single fluid case from q_vf(advxb) to hardcoded 1.0, which may break existing functionality and needs validation.

if (igr) then
    if (num_fluids == 1) then
        alpha_rho_K(1) = q_vf(contxb)%sf(k, l, r)
        alpha_K(1) = 1._wp
    else

Copy link

qodo-merge-pro bot commented Jul 25, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Clarify error message for time stepper requirement
Suggestion Impact:The suggestion was implemented by removing the confusing error message check entirely and restructuring the code to conditionally handle the num_ts == 2 case instead of aborting

code diff:

 #ifdef FRONTIER_UNIFIED
-        if (proc_rank == 0 .and. num_ts /= 2) then
-            call s_mpi_abort("Frontier unified memory assumes time_stepper = 2/3")
-        end if
-
         ! Allocate to memory regions using hip calls
         ! that we will attach pointers to
         do i = 1, 3
@@ -133,17 +128,21 @@
         ! CCE see it can access this and will leave it on the host. It will stay on the host so long as HSA_XNACK=1
         ! NOTE: WE CANNOT DO ATOMICS INTO THIS MEMORY. We have to change a property to use atomics here
         ! Otherwise leaving this as fine-grained will actually help performance since it can't be cached in GPU L2
-        call hipCheck(hipMallocManaged(q_cons_ts_pool_host, dims8=pool_dims, lbounds8=pool_starts, flags=hipMemAttachGlobal))
+        if (num_ts == 2) then
+            call hipCheck(hipMallocManaged(q_cons_ts_pool_host, dims8=pool_dims, lbounds8=pool_starts, flags=hipMemAttachGlobal))
+        end if

The error message is confusing as it mentions "time_stepper = 2/3" but the
condition checks for num_ts /= 2. The message should clearly state the expected
value and be more descriptive about the requirement.

src/simulation/m_time_steppers.fpp [115-117]

 if (proc_rank == 0 .and. num_ts /= 2) then
-    call s_mpi_abort("Frontier unified memory assumes time_stepper = 2/3")
+    call s_mpi_abort("Frontier unified memory requires num_ts = 2 (RK2 time stepping)")
 end if
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a confusing error message and proposes a clearer version, which improves code maintainability and user experience.

Medium
Use distinct variables for z-direction interpolation

The upsampling algorithm reuses temp(1) and temp(2) variables which can lead to
confusion and potential bugs. Use distinct variable names for the z-direction
interpolation to improve code clarity and maintainability.

src/simulation/m_sim_helpers.fpp [318-324]

 temp(1) = (2._wp/3._wp)*q_cons_temp(i)%sf(ix, iy, iz) + (1._wp/3._wp)*q_cons_temp(i)%sf(ix + x_id, iy, iz)
 temp(2) = (2._wp/3._wp)*q_cons_temp(i)%sf(ix, iy + y_id, iz) + (1._wp/3._wp)*q_cons_temp(i)%sf(ix + x_id, iy + y_id, iz)
 temp(3) = (2._wp/3._wp)*temp(1) + (1._wp/3._wp)*temp(2)
 
-temp(1) = (2._wp/3._wp)*q_cons_temp(i)%sf(ix, iy, iz + z_id) + (1._wp/3._wp)*q_cons_temp(i)%sf(ix + x_id, iy, iz + z_id)
-temp(2) = (2._wp/3._wp)*q_cons_temp(i)%sf(ix, iy + y_id, iz + z_id) + (1._wp/3._wp)*q_cons_temp(i)%sf(ix + x_id, iy + y_id, iz + z_id)
-temp(4) = (2._wp/3._wp)*temp(1) + (1._wp/3._wp)*temp(2)
+temp_z1 = (2._wp/3._wp)*q_cons_temp(i)%sf(ix, iy, iz + z_id) + (1._wp/3._wp)*q_cons_temp(i)%sf(ix + x_id, iy, iz + z_id)
+temp_z2 = (2._wp/3._wp)*q_cons_temp(i)%sf(ix, iy + y_id, iz + z_id) + (1._wp/3._wp)*q_cons_temp(i)%sf(ix + x_id, iy + y_id, iz + z_id)
+temp(4) = (2._wp/3._wp)*temp_z1 + (1._wp/3._wp)*temp_z2
Suggestion importance[1-10]: 5

__

Why: The suggestion improves code readability by using distinct temporary variables for the z-direction interpolation, which makes the algorithm easier to follow.

Low
  • Update

Copy link

codecov bot commented Jul 25, 2025

Codecov Report

❌ Patch coverage is 27.20207% with 281 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.33%. Comparing base (52c0aac) to head (bf56824).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/simulation/m_time_steppers.fpp 27.63% 52 Missing and 3 partials ⚠️
src/common/m_helper.fpp 0.00% 53 Missing ⚠️
src/pre_process/m_data_output.fpp 22.91% 34 Missing and 3 partials ⚠️
src/simulation/m_data_output.fpp 18.18% 29 Missing and 7 partials ⚠️
src/simulation/m_start_up.fpp 16.66% 28 Missing and 2 partials ⚠️
src/post_process/m_data_input.f90 33.33% 23 Missing and 3 partials ⚠️
src/common/m_mpi_common.fpp 0.00% 20 Missing ⚠️
src/pre_process/m_initial_condition.fpp 30.00% 5 Missing and 2 partials ⚠️
src/post_process/m_start_up.f90 20.00% 3 Missing and 1 partial ⚠️
src/common/m_variables_conversion.fpp 50.00% 2 Missing and 1 partial ⚠️
... and 8 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #967      +/-   ##
==========================================
- Coverage   43.67%   43.33%   -0.34%     
==========================================
  Files          70       70              
  Lines       19818    20046     +228     
  Branches     2473     2510      +37     
==========================================
+ Hits         8655     8687      +32     
- Misses       9644     9824     +180     
- Partials     1519     1535      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson
Copy link
Member

let me know how you want to deal with this PR

@wilfonba
Copy link
Collaborator Author

@sbryngelson I need to do two quick visualizations and then some profiles with UVM on Frontier. Can you trigger the benchmarking to see if anything else slowed down unexpectedly?

@sbryngelson
Copy link
Member

yes

@sbryngelson
Copy link
Member

According to benchmarks IGR is 17% slower so it failed CPU benchmark. IGR is faster on GPU cases for both nvidia and amd gpu, not sure what to make of that... perhaps you are including some other things that make it faster...

@wilfonba
Copy link
Collaborator Author

According to benchmarks IGR is 17% slower so it failed CPU benchmark. IGR is faster on GPU cases for both nvidia and amd gpu, not sure what to make of that... perhaps you are including some other things that make it faster...

Yeah, I noticed the CPU slowdown. I don't remember seeing that GPU was faster, but maybe I just missed it. Still working on getting the expected unified performance on Frontier

@wilfonba
Copy link
Collaborator Author

wilfonba commented Aug 2, 2025

@sbryngelson This should be ready to merge once the tests/benchmarks pass

@wilfonba
Copy link
Collaborator Author

wilfonba commented Aug 2, 2025

CPU benchmarking is still showing a slowdown while GPU benchmarks show a slight speedup. My vote is to ignore that slowdown because GPU is what really matters.

Copy link
Member

@sbryngelson sbryngelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor

@wilfonba
Copy link
Collaborator Author

wilfonba commented Aug 3, 2025

Comments addressed. I'll need to verify that down-sampling still functions as expected, given the kind of changes I had to make.

@wilfonba
Copy link
Collaborator Author

wilfonba commented Aug 3, 2025

I can confirm that down sampling is working as expected with the changes

@sbryngelson
Copy link
Member

i'm very confused why this is slower on cpu.... and meaningfully so.

@wilfonba
Copy link
Collaborator Author

wilfonba commented Aug 3, 2025

i'm very confused why this is slower on cpu.... and meaningfully so.

I have no clue. None of the changes I made should have slowed anything down. I observe a similar CPU slowdown on my Mac with GNU 14.2.0

@wilfonba
Copy link
Collaborator Author

wilfonba commented Aug 3, 2025

It didn't slow down. The variable sys_size went from 6 on the master branch to 5 on this branch for the single fluid case. Since the grind time is divided by the number of equations, the master branch thought it was solving one more equation in the same amount of time. The weird thing is the benchmark output doesn't seem to make sense. Here's what I'm seeing

Device Master Grind time PR grind time Mast/PR Printed Speedup
CPU 38.49 45.37 0.85 0.85
GPU 0.758 0.716 1.05 1.05

I've included the relevant printed logs output from the CI. What's still weird is that CPU reports a slowdown and GPU reports a speedup. In the GPU printed logs, the results for the master branch are printed first, followed by the results for the PR. In the CPU printed logs, the order is reversed. Basically, the code isn't slower, but there's something up with the benchmarking diffs.

CPU Master Log

igr:
description:
args:
- -c
- phoenix-bench
- -n
- '12'
path: /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/master/benchmarks/igr/case.py
slug: igr
output_summary:
invocation:
- run
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/master/benchmarks/igr/case.py
- --case-optimization
- --targets
- pre_process
- simulation
- post_process
- --output-summary
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/master/build/benchmarks/cc77/igr.yaml
- -c
- phoenix-bench
- -n
- '12'
- --
- --gbpp
- '1'
lock:
debug: false
gcov: false
gpu: false
mpi: true
single: false
unified: false
post_process:
exec: 5.0192976
pre_process:
exec: 1.8623343
simulation:
exec: 433.1306084
grind: 38.49284332
syscheck:
exec: 0.7191267

CPU PR Log

igr:
description:
args:
- -c
- phoenix-bench
- -n
- '12'
path: /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/pr/benchmarks/igr/case.py
slug: igr
output_summary:
invocation:
- run
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/pr/benchmarks/igr/case.py
- --case-optimization
- --targets
- pre_process
- simulation
- post_process
- --output-summary
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-4/_work/MFC/MFC/pr/build/benchmarks/c408/igr.yaml
- -c
- phoenix-bench
- -n
- '12'
- --
- --gbpp
- '1'
lock:
debug: false
gcov: false
gpu: false
mpi: true
single: false
unified: false
post_process:
exec: 4.6853891
pre_process:
exec: 1.7488573
simulation:
exec: 424.3913942
grind: 45.37525157
syscheck:
exec: 0.7126154

GPU Master Log

igr:
description:
args:
- -c
- phoenix-bench
- --gpu
- -g
- '0'
- '1'
- -n
- '2'
path: /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/master/benchmarks/igr/case.py
slug: igr
output_summary:
invocation:
- run
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/master/benchmarks/igr/case.py
- --case-optimization
- --targets
- pre_process
- simulation
- post_process
- --output-summary
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/master/build/benchmarks/2b3e/igr.yaml
- -c
- phoenix-bench
- --gpu
- -g
- '0'
- '1'
- -n
- '2'
- --
- --gbpp
- '12'
lock:
debug: false
gcov: false
gpu: true
mpi: true
single: false
unified: false
post_process:
exec: 7.6285611
pre_process:
exec: 5.4373726
simulation:
exec: 40.6332197
grind: 0.75806516
syscheck:
exec: 1.3913716

GPU PR Log

igr:
description:
args:
- -c
- phoenix-bench
- --gpu
- -g
- '0'
- '1'
- -n
- '2'
path: /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/pr/benchmarks/igr/case.py
slug: igr
output_summary:
invocation:
- run
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/pr/benchmarks/igr/case.py
- --case-optimization
- --targets
- pre_process
- simulation
- post_process
- --output-summary
- /storage/scratch1/6/sbryngelson3/mfc-runners/actions-runner-2/_work/MFC/MFC/pr/build/benchmarks/393f/igr.yaml
- -c
- phoenix-bench
- --gpu
- -g
- '0'
- '1'
- -n
- '2'
- --
- --gbpp
- '12'
lock:
debug: false
gcov: false
gpu: true
mpi: true
single: false
unified: false
post_process:
exec: 6.8433884
pre_process:
exec: 5.1596627
simulation:
exec: 33.2053005
grind: 0.71615578
syscheck:
exec: 1.5515504

@sbryngelson sbryngelson merged commit 6b3fe8e into MFlowCode:master Aug 3, 2025
30 of 31 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Development

Successfully merging this pull request may close these issues.

2 participants