NVHPC takes >10 minutes to compile `params.cpp` #922

bprather · 2023-08-15T20:18:01Z

When compiling Parthenon with NVHPC (every version, machine, host & target architecture I tested), the compiler takes a very long time on one specific file, params.cpp. In my testing it adds about ~10 minutes to the compile on both desktop & HPC machines, all of which is spent solely compling this one file on one core, mostly in the cpp2 subprogram.

The compiler is also pretty slow with some other later-stage Parthenon files (e.g. metadata.cpp), but params.cpp is by far the worst. This appears to be due to a change within the last ~8mo and likely on the more recent side of that, since previous versions I've used with KHARMA do not show this behavior with any NVHPC version. Now that I've tested enough to know it's present in all versions of NVHPC, I'll be bisecting locally and hopefully isolate the relevant commit.

Just wanted to check whether others have seen similar behavior, and if so whether it's worth surfacing to Nvidia.

The text was updated successfully, but these errors were encountered:

Yurlungur · 2023-08-15T20:35:20Z

I have not observed this, but I suspect I know what it is. params.cpp now has a lot of template nonsense in it to support output of params of different non-scalar types to the restart file. It could be split up into several implementation files to let the compiler parallelize probably, if that would be useful.

bprather · 2023-08-15T21:51:08Z

I can verify it's present in Parthenon proper when using the Kokkos nvcc_wrapper with nvc++ to compile for device (incidentally the process also uses up to ~11GB of RAM by itself, which could be an issue for even getting the code compiled on smaller machines). However, the issue isn't present when compiling with nvc++ for the host, which is interesting. The offending process cpp2 definitely seems to be processing CUDA front-end output for the file, so if we could find a way to avoid compiling (attempting to compile?) its functions for device we'd avoid the issue entirely.

That said, it's definitely a compiler issue at its core. Going to try splitting up the file next as @Yurlungur suggested, but maybe we should also think about simplifying it so we don't have two processes both trying to chew on these templates?

Yurlungur · 2023-08-15T21:56:37Z

That said, it's definitely a compiler issue at its core. Going to try splitting up the file next as @Yurlungur suggested, but maybe we should also think about simplifying it so we don't have two processes both trying to chew on these templates?

If you can figure out a way to simplify that templated code, by all means. I think there's some necessary complexity here, though. We're basically compiling the same function a bunch of times for different types for input/output data for file IO. So there's just a lot of code for the compiler to get through. It might be possible to move some of the code out of the lowest level functions in the header file into implementation files maybe? But I don't see any obvious wins there.

bprather · 2023-10-06T03:30:53Z

Just tried to compile with NVHPC/NVC++ 23.9 and... it segfaults. The crash is in cpp2 as you might expect, when processing params.cudafe1.cpp. So, if we can't figure out how to avoid compiling this particular code with CUDA, this might be a good time to bump this up to Nvidia.

pgrete · 2023-10-06T08:45:03Z

Looks like nvc++ is now officially a supported compiler for Kokkos 4 (https://kokkos.github.io/kokkos-core-wiki/requirements.html#kokkos-4-x) though I'm not sure if this is for host or device code.
Last time I checked nvc++ was not supported for device builds.
So if this also happens for host builds, then it's definitely time to raise this to nvidia.

bprather · 2023-12-01T18:59:13Z

This issue has gotten worse (>30min compiles with NVHPC 22.11 and above, which may as well be a compile error). It doesn't disappear even if I place

#ifndef __CUDA_ARCH__
...
#endif

around the entire file to prevent device compilation (though it seems to help a bit?) so it's at least partially host-side as long as nvcc is the wrapper doing the processing (FWIW placing __CUDACC__ as the test results in a linker error, so nvcc must be processing the host code too). Thus, I don't think that selectively avoiding compilation will help us here.

It's also starting to be a problem in GCC-based compiles with recent versions (on the now quaint order of 8-10 minute compiles), and it seems to be the long pole in all Parthenon compiles even with LLVM-based compilers.

So, I'm bumping this: if anyone has simple modifications to params.cpp that would make this any faster to compile and would be easy to test, I would be eager to help test them out. Otherwise I'll take a look next week, though I think my knowledge of templates is insufficient to produce a quick or light-handed fix, I'd likely default to just trying to break up all the instantiations into different units

bprather mentioned this issue Sep 25, 2023

Restructure buffer packing kernels #938

Merged

6 tasks

bprather mentioned this issue Dec 4, 2023

KHARMA segfaults on Nvidia+Slingshot machines in some circumstances AFD-Illinois/kharma#46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVHPC takes >10 minutes to compile `params.cpp` #922

NVHPC takes >10 minutes to compile `params.cpp` #922

bprather commented Aug 15, 2023

Yurlungur commented Aug 15, 2023

bprather commented Aug 15, 2023

Yurlungur commented Aug 15, 2023

bprather commented Oct 6, 2023

pgrete commented Oct 6, 2023

bprather commented Dec 1, 2023 •

edited

Loading

NVHPC takes >10 minutes to compile params.cpp #922

NVHPC takes >10 minutes to compile params.cpp #922

Comments

bprather commented Aug 15, 2023

Yurlungur commented Aug 15, 2023

bprather commented Aug 15, 2023

Yurlungur commented Aug 15, 2023

bprather commented Oct 6, 2023

pgrete commented Oct 6, 2023

bprather commented Dec 1, 2023 • edited Loading

NVHPC takes >10 minutes to compile `params.cpp` #922

NVHPC takes >10 minutes to compile `params.cpp` #922

bprather commented Dec 1, 2023 •

edited

Loading