Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

add nvtx equivalent for rocm #940

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open

Conversation

simonpintarelli
Copy link
Collaborator

@simonpintarelli simonpintarelli commented Dec 15, 2023

Adding
https://github.com/ROCm/roctracer/tree/amd-master?tab=readme-ov-file#roctx-api
which is identical to the currently used nvtx api.

  • roctracer lacks a cmake-config, add a FindRocTX.cmake to cmake/modules
  • note: +nvtx isn't part of the ci

@simonpintarelli
Copy link
Collaborator Author

build tested locally (cuda12) and (rocm 5.7.1) with +cuda+nvtx, +rocm+nvtx.

@gsavva
Copy link
Collaborator

gsavva commented Sep 26, 2024

@simonpintarelli would it be possible to bring this up to date and possibly merge?
Or would you suggest we keep the roctracer feature in its separate branch ?

@simonpintarelli simonpintarelli marked this pull request as ready for review September 26, 2024 17:05
@simonpintarelli simonpintarelli changed the title WIP: add nvtx equivalent for rocm add nvtx equivalent for rocm Sep 26, 2024
@simonpintarelli
Copy link
Collaborator Author

Thanks for the reminder @gsavva. Is it correct that it worked for you on lumi?

@gsavva
Copy link
Collaborator

gsavva commented Sep 27, 2024

Thanks for the reminder @gsavva. Is it correct that it worked for you on lumi?

Yes, I was using it on LUMI-G, and it was working with the only caveat described in the issue #961 (I had to comment-out a few timers for the post-processing script of rocprofiler to function properly).

Copy link
Collaborator

@toxa81 toxa81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to be merged

@toxa81
Copy link
Collaborator

toxa81 commented Oct 4, 2024

ping @gsavva

@gsavva
Copy link
Collaborator

gsavva commented Oct 4, 2024

@toxa81 I'll need to check the status of #961 with rocprof and my extra GPU timers and give my feedback.

- add dependency on `roctracer-dev` when(+rocm+nvtx)
- conflict +nvtx when neither rocm nor cuda is enabled
@gsavva
Copy link
Collaborator

gsavva commented Oct 14, 2024

The test_lr_solver fails with --roctx-trace :
srun -u -n1 rocprof --roctx-trace test_lr_solver --device=gpu --N=2 --num_bands=10

test_lr_solver : Failed
exception occured:
SpFFT: GPU FFT error

On the other hand, test_lr_solver runs and completes fine when:

  • it is run on its own (no rocprof),
  • it is run using only rocprof (no --roctx-trace, which is meant for tracing and visualizing specific regions on the code; SIRIUS timers are taken as regions)

I would suggest merging this PR and investigate this issue separately (it might be related to the recent update of ROCm)

@simonpintarelli
Copy link
Collaborator Author

it is indeed crashing, here is the backtrace:

#0  0x000014c081453d2b in raise () from /lib64/libc.so.6
#1  0x000014c0814553e5 in abort () from /lib64/libc.so.6
#2  0x000014c081aa55c9 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../cpe-gcc-12.2.0-202304182231.7dfee50f41751/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x000014c081ab0bfa in __cxxabiv1::__terminate (handler=<optimized out>)
    at ../../../../cpe-gcc-12.2.0-202304182231.7dfee50f41751/libstdc++-v3/libsupc++/eh_terminate.cc:48
#4  0x000014c081ab0c65 in std::terminate () at ../../../../cpe-gcc-12.2.0-202304182231.7dfee50f41751/libstdc++-v3/libsupc++/eh_terminate.cc:58
#5  0x000014c081ab0eb7 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x14c118464f58 <typeinfo for spfft::GPUFFTError>, 
    dest=0x14c11843fee0 <spfft::GPUFFTError::~GPUFFTError()>) at ../../../../cpe-gcc-12.2.0-202304182231.7dfee50f41751/libstdc++-v3/libsupc++/eh_throw.cc:98
#6  0x000014c11844003f in spfft::gpu::fft::check_result(hipfftResult_t) [clone .part.0] ()
   from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#7  0x000014c118442a5b in spfft::TransformReal2DGPU<double>::TransformReal2DGPU(spfft::GPUArrayView3D<double>, spfft::GPUArrayView3D<HIP_vector_type<double, 2u> >, spfft::GPUStreamHandle, std::shared_ptr<spfft::GPUArray<char> >) ()
   from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#8  0x000014c1184463ed in spfft::ExecutionGPU<double>::ExecutionGPU(int, std::shared_ptr<spfft::Parameters>, spfft::HostArray<std::complex<double> >&, spfft::HostArray<std::complex<double> >&, spfft::GPUArray<HIP_vector_type<double, 2u> >&, spfft::GPUArray<HIP_vector_type<double, 2u> >&, std::shared_ptr<spfft::GPUArray<char> > const&) () from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#9  0x000014c11843acd4 in spfft::TransformInternal<double>::TransformInternal(SpfftProcessingUnitType, std::shared_ptr<spfft::GridInternal<double> >, std::shared_ptr<spfft::Parameters>) () from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#10 0x000014c11843653a in spfft::Transform::Transform(std::shared_ptr<spfft::GridInternal<double> > const&, SpfftProcessingUnitType, SpfftTransformType, int, int, int, int, int, SpfftIndexFormatType, int const*) ()
   from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#11 0x000014c11843ca35 in spfft::Grid::create_transform(SpfftProcessingUnitType, SpfftTransformType, int, int, int, int, int, SpfftIndexFormatType, int const*) const () from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#12 0x000014c1203e31c9 in sirius::Simulation_context::update (this=this@entry=0x1ac7180)
    at /tmp/sipintar/spack-stage/spack-stage-sirius-git.feat_roctracer_develop-hj3ab676gx2i5l7mlpd4bwvuljcjr4qi/spack-src/src/context/simulation_context.cpp:873
#13 0x000014c1203e935a in sirius::Simulation_context::initialize (this=<optimized out>)
    at /tmp/sipintar/spack-stage/spack-stage-sirius-git.feat_roctracer_develop-hj3ab676gx2i5l7mlpd4bwvuljcjr4qi/spack-src/src/context/simulation_context.cpp:486
#14 0x000000000047f457 in sirius::create_simulation_context (conf__=..., L__=..., num_atoms__=<optimized out>, coord__=..., add_vloc__=add_vloc__@entry=true, 
    add_dion__=add_dion__@entry=true) at /opt/cray/pe/gcc/12.2.0/snos/include/g++/bits/unique_ptr.h:191
#15 0x000000000043257a in test_lr_solver (args__=...)
    at /tmp/sipintar/spack-stage/spack-stage-sirius-git.feat_roctracer_develop-hj3ab676gx2i5l7mlpd4bwvuljcjr4qi/spack-src/apps/tests/test_lr_solver.cpp:304
#16 0x0000000000440159 in sirius::call_test<int (&)(sirius::cmd_args const&), sirius::cmd_args&> (label__=..., 
    f__=@0x4317c0: {int (const sirius::cmd_args &)} 0x4317c0 <test_lr_solver(sirius::cmd_args const&)>)
    at /opt/cray/pe/gcc/12.2.0/snos/include/g++/bits/char_traits.h:354
#17 0x0000000000428a9b in main (argn=4, argv=0x7ffc24ba23c8)
    at /tmp/sipintar/spack-stage/spack-stage-sirius-git.feat_roctracer_develop-hj3ab676gx2i5l7mlpd4bwvuljcjr4qi/spack-src/apps/tests/test_lr_solver.cpp:344

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants