Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Hang on collecting GRBM_GUI_ACTIVE in LAMMPS #165

Open
skyreflectedinmirrors opened this issue Sep 21, 2022 · 3 comments
Open

Hang on collecting GRBM_GUI_ACTIVE in LAMMPS #165

skyreflectedinmirrors opened this issue Sep 21, 2022 · 3 comments

Comments

@skyreflectedinmirrors
Copy link

skyreflectedinmirrors commented Sep 21, 2022

Using:

OMNITRACE_CONFIG_FILE                              = 
OMNITRACE_USE_PERFETTO                             = true
OMNITRACE_USE_TIMEMORY                             = false
OMNITRACE_USE_SAMPLING                             = false
OMNITRACE_USE_PROCESS_SAMPLING                     = false
OMNITRACE_USE_ROCTRACER                            = true
OMNITRACE_USE_ROCM_SMI                             = true
OMNITRACE_USE_KOKKOSP                              = false
OMNITRACE_USE_PID                                  = true
OMNITRACE_USE_RCCLP                                = false
OMNITRACE_USE_ROCPROFILER                          = true
OMNITRACE_USE_ROCTX                                = false
OMNITRACE_OUTPUT_PATH                              = omnitrace-%tag%-output
OMNITRACE_OUTPUT_PREFIX                            = 
OMNITRACE_CRITICAL_TRACE                           = false
OMNITRACE_PAPI_EVENTS                              = PAPI_TOT_CYC
OMNITRACE_PERFETTO_BACKEND                         = inprocess
OMNITRACE_PERFETTO_BUFFER_SIZE_KB                  = 1024000
OMNITRACE_PERFETTO_FILL_POLICY                     = discard
OMNITRACE_PROCESS_SAMPLING_DURATION                = -1
OMNITRACE_PROCESS_SAMPLING_FREQ                    = 0
OMNITRACE_ROCM_EVENTS                              = GRBM_GUI_ACTIVE
OMNITRACE_SAMPLING_CPUS                            = all
OMNITRACE_SAMPLING_DELAY                           = 0.5
OMNITRACE_SAMPLING_DURATION                        = 0
OMNITRACE_SAMPLING_FREQ                            = 200
OMNITRACE_SAMPLING_GPUS                            = 0,1
OMNITRACE_TIME_OUTPUT                              = true
OMNITRACE_TIMEMORY_COMPONENTS                      = wall_clock
OMNITRACE_VERBOSE                                  = 0
OMNITRACE_ENABLED                                  = true
OMNITRACE_SUPPRESS_CONFIG                          = false
OMNITRACE_SUPPRESS_PARSING                         = false

hangs on the first kernel call:

$ AMD_LOG_LEVEL=3 /home/nicurtis/lammps_benchmarking/install/tpl/openmpi/bin/mpirun --mca pml ucx --mca btl ^vader,tcp,openib,uct -np 1 ./lmp -k on g 1 -sf kk -pk kokkos cuda/aware on neigh half neigh/qeq full newton on -v x 6 -v y 6 -v z 8 -v steps 25 -in in.reaxc.hns -nocite -log TheraC63/reaxff//log.lammps
[omnitrace][omnitrace_init_tooling] Instrumentation mode: Trace


      ______   .___  ___. .__   __.  __  .___________..______          ___       ______  _______
     /  __  \  |   \/   | |  \ |  | |  | |           ||   _  \        /   \     /      ||   ____|
    |  |  |  | |  \  /  | |   \|  | |  | `---|  |----`|  |_)  |      /  ^  \   |  ,----'|  |__
    |  |  |  | |  |\/|  | |  . `  | |  |     |  |     |      /      /  /_\  \  |  |     |   __|
    |  `--'  | |  |  |  | |  |\   | |  |     |  |     |  |\  \----./  _____  \ |  `----.|  |____
     \______/  |__|  |__| |__| \__| |__|     |__|     | _| `._____/__/     \__\ \______||_______|

    
[066.998]       perfetto.cc:55910 Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024000 KB, total sessions:1, uid:0 session name: ""

[omnitrace][pid=30219] MPI rank: 0 (0), MPI size: 1 (1)
LAMMPS (23 Jun 2022 - Update 1)
KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:105)
  will use up to 1 GPU(s) per node
:3:rocdevice.cpp            :416 : 81067696131 us: 30219: [tid:0x7f68d9031280] Initializing HSA stack.
:3:comgrctx.cpp             :33  : 81067696207 us: 30219: [tid:0x7f68d9031280] Loading COMGR library.
:3:rocdevice.cpp            :207 : 81067696378 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5df3880
:3:rocdevice.cpp            :1611: 81067696802 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067697588 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5e30cb0
:3:rocdevice.cpp            :1611: 81067697802 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067698438 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5e6e3d0
:3:rocdevice.cpp            :1611: 81067698628 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067699255 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5eabad0
:3:rocdevice.cpp            :1611: 81067699441 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067700248 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5ee91e0
:3:rocdevice.cpp            :1611: 81067700432 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067701884 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5f26930
:3:rocdevice.cpp            :1611: 81067702074 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067703320 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5f64010
:3:rocdevice.cpp            :1611: 81067703500 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:rocdevice.cpp            :207 : 81067704752 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5fa1710
:3:rocdevice.cpp            :1611: 81067704929 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0

:3:hip_context.cpp          :50  : 81067706380 us: 30219: [tid:0x7f68d9031280] Direct Dispatch: 1
:3:hip_device_runtime.cpp   :517 : 81067708010 us: 30219: [tid:0x7f68d9031280] hipGetDeviceCount: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708019 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c2e0, 0 )
:3:hip_device.cpp           :348 : 81067708219 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708237 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c5f8, 1 )
:3:hip_device.cpp           :348 : 81067708254 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708258 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c910, 2 )
:3:hip_device.cpp           :348 : 81067708286 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708298 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5cc28, 3 )
:3:hip_device.cpp           :348 : 81067708312 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708316 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5cf40, 4 )
:3:hip_device.cpp           :348 : 81067708329 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708333 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d258, 5 )
:3:hip_device.cpp           :348 : 81067708356 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708367 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d570, 6 )
:3:hip_device.cpp           :348 : 81067708380 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device.cpp           :346 : 81067708385 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d888, 7 )
:3:hip_device.cpp           :348 : 81067708395 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :530 : 81067708403 us: 30219: [tid:0x7f68d9031280] hipSetDevice ( 0 )
:3:hip_device_runtime.cpp   :535 : 81067708424 us: 30219: [tid:0x7f68d9031280] hipSetDevice: Returned hipSuccess : 
:3:hip_memory.cpp           :493 : 81067708445 us: 30219: [tid:0x7f68d9031280] hipMalloc ( 0x7fff288c3f20, 8448 )
:3:rocdevice.cpp            :2093: 81067708474 us: 30219: [tid:0x7f68d9031280] device=0x653dda0, freeMem_ = 0xfeffdf00
:3:hip_memory.cpp           :495 : 81067708478 us: 30219: [tid:0x7f68d9031280] hipMalloc: Returned hipSuccess : 0x7f6051b00000: duration: 33 us
:3:hip_memory.cpp           :1225: 81067708487 us: 30219: [tid:0x7f68d9031280] hipMemcpyAsync ( 0x7f6051b00000, 0x7fff288c40c0, 256, hipMemcpyDefault, stream:<null> )
:3:rocdevice.cpp            :2686: 81067708503 us: 30219: [tid:0x7f68d9031280] number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4
:3:rocdevice.cpp            :2757: 81067721343 us: 30219: [tid:0x7f68d9031280] created hardware queue 0x7f68680ca000 with size 4096 with priority 1, cooperative: 0
:3:devprogram.cpp           :2675: 81067924077 us: 30219: [tid:0x7f68d9031280] Using Code Object V4.
:3:devprogram.cpp           :2978: 81067925217 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillImage
:3:devprogram.cpp           :2978: 81067925223 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned2D
:3:devprogram.cpp           :2978: 81067925225 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned
:3:devprogram.cpp           :2978: 81067925227 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImage1DA
:3:devprogram.cpp           :2978: 81067925228 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferAligned
:3:devprogram.cpp           :2978: 81067925229 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWait
:3:devprogram.cpp           :2978: 81067925230 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBuffer
:3:devprogram.cpp           :2978: 81067925232 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWrite
:3:devprogram.cpp           :2978: 81067925233 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRectAligned
:3:devprogram.cpp           :2978: 81067925234 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_gwsInit
:3:devprogram.cpp           :2978: 81067925236 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRect
:3:devprogram.cpp           :2978: 81067925237 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImageToBuffer
:3:devprogram.cpp           :2978: 81067925238 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferToImage
:3:devprogram.cpp           :2978: 81067925239 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImage
:3:rocvirtual.hpp           :62  : 81067925542 us: 30219: [tid:0x7f68d9031280] Host active wait for Signal = (0x7f686811d180) for 100000 ns
:3:rocvirtual.cpp           :143 : 81067925558 us: 30219: [tid:0x7f68d9031280] Signal = (0x7f686811d180), start = 81067925545769, end = 81067925547369
:3:hip_memory.cpp           :1226: 81067925567 us: 30219: [tid:0x7f68d9031280] hipMemcpyAsync: Returned hipSuccess : : duration: 217080 us
:3:hip_stream.cpp           :450 : 81067925582 us: 30219: [tid:0x7f68d9031280] hipStreamSynchronize ( stream:<null> )
:3:rocdevice.cpp            :2636: 81067925599 us: 30219: [tid:0x7f68d9031280] No HW event
:3:hip_stream.cpp           :451 : 81067925601 us: 30219: [tid:0x7f68d9031280] hipStreamSynchronize: Returned hipSuccess : 
:3:hip_memory.cpp           :2461: 81067925613 us: 30219: [tid:0x7f68d9031280] hipMemset ( 0x7f6051b00100, 0, 8192 )
:3:rocvirtual.cpp           :679 : 81067925626 us: 30219: [tid:0x7f68d9031280] Arg3: ulong* bufULong = ptr:0x7f6051b00000 obj:[0x7f6051b00000-0x7f6051b02100]
:3:rocvirtual.cpp           :679 : 81067925628 us: 30219: [tid:0x7f68d9031280] Arg4: uchar* pattern = ptr:0x7f686807c080 obj:[0x7f686807c000-0x7f686807d000]
:3:rocvirtual.cpp           :753 : 81067925630 us: 30219: [tid:0x7f68d9031280] Arg5: uint patternSize = val:1
:3:rocvirtual.cpp           :753 : 81067925631 us: 30219: [tid:0x7f68d9031280] Arg6: ulong offset = val:32
:3:rocvirtual.cpp           :753 : 81067925633 us: 30219: [tid:0x7f68d9031280] Arg7: ulong size = val:1024
:3:rocvirtual.cpp           :2723: 81067925634 us: 30219: [tid:0x7f68d9031280] ShaderName : __amd_rocclr_fillBufferAligned
:3:rocvirtual.hpp           :62  : 81067935725 us: 30219: [tid:0x7f68d9031280] Host active wait for Signal = (0x7f686811d080) for -1 ns
# hangs here forever
@skyreflectedinmirrors
Copy link
Author

On 472e96a

@skyreflectedinmirrors
Copy link
Author

No hang w/ OMNITRACE_PAPI_EVENTS, but it doesn't show in the trace either.

@ppanchad-amd
Copy link

Hi @skyreflectedinmirrors, is this ticket still relevant? Thanks!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants