We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Using:
OMNITRACE_CONFIG_FILE = OMNITRACE_USE_PERFETTO = true OMNITRACE_USE_TIMEMORY = false OMNITRACE_USE_SAMPLING = false OMNITRACE_USE_PROCESS_SAMPLING = false OMNITRACE_USE_ROCTRACER = true OMNITRACE_USE_ROCM_SMI = true OMNITRACE_USE_KOKKOSP = false OMNITRACE_USE_PID = true OMNITRACE_USE_RCCLP = false OMNITRACE_USE_ROCPROFILER = true OMNITRACE_USE_ROCTX = false OMNITRACE_OUTPUT_PATH = omnitrace-%tag%-output OMNITRACE_OUTPUT_PREFIX = OMNITRACE_CRITICAL_TRACE = false OMNITRACE_PAPI_EVENTS = PAPI_TOT_CYC OMNITRACE_PERFETTO_BACKEND = inprocess OMNITRACE_PERFETTO_BUFFER_SIZE_KB = 1024000 OMNITRACE_PERFETTO_FILL_POLICY = discard OMNITRACE_PROCESS_SAMPLING_DURATION = -1 OMNITRACE_PROCESS_SAMPLING_FREQ = 0 OMNITRACE_ROCM_EVENTS = GRBM_GUI_ACTIVE OMNITRACE_SAMPLING_CPUS = all OMNITRACE_SAMPLING_DELAY = 0.5 OMNITRACE_SAMPLING_DURATION = 0 OMNITRACE_SAMPLING_FREQ = 200 OMNITRACE_SAMPLING_GPUS = 0,1 OMNITRACE_TIME_OUTPUT = true OMNITRACE_TIMEMORY_COMPONENTS = wall_clock OMNITRACE_VERBOSE = 0 OMNITRACE_ENABLED = true OMNITRACE_SUPPRESS_CONFIG = false OMNITRACE_SUPPRESS_PARSING = false
hangs on the first kernel call:
$ AMD_LOG_LEVEL=3 /home/nicurtis/lammps_benchmarking/install/tpl/openmpi/bin/mpirun --mca pml ucx --mca btl ^vader,tcp,openib,uct -np 1 ./lmp -k on g 1 -sf kk -pk kokkos cuda/aware on neigh half neigh/qeq full newton on -v x 6 -v y 6 -v z 8 -v steps 25 -in in.reaxc.hns -nocite -log TheraC63/reaxff//log.lammps [omnitrace][omnitrace_init_tooling] Instrumentation mode: Trace ______ .___ ___. .__ __. __ .___________..______ ___ ______ _______ / __ \ | \/ | | \ | | | | | || _ \ / \ / || ____| | | | | | \ / | | \| | | | `---| |----`| |_) | / ^ \ | ,----'| |__ | | | | | |\/| | | . ` | | | | | | / / /_\ \ | | | __| | `--' | | | | | | |\ | | | | | | |\ \----./ _____ \ | `----.| |____ \______/ |__| |__| |__| \__| |__| |__| | _| `._____/__/ \__\ \______||_______| [066.998] perfetto.cc:55910 Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024000 KB, total sessions:1, uid:0 session name: "" [omnitrace][pid=30219] MPI rank: 0 (0), MPI size: 1 (1) LAMMPS (23 Jun 2022 - Update 1) KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:105) will use up to 1 GPU(s) per node :3:rocdevice.cpp :416 : 81067696131 us: 30219: [tid:0x7f68d9031280] Initializing HSA stack. :3:comgrctx.cpp :33 : 81067696207 us: 30219: [tid:0x7f68d9031280] Loading COMGR library. :3:rocdevice.cpp :207 : 81067696378 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5df3880 :3:rocdevice.cpp :1611: 81067696802 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067697588 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5e30cb0 :3:rocdevice.cpp :1611: 81067697802 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067698438 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5e6e3d0 :3:rocdevice.cpp :1611: 81067698628 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067699255 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[2]=0x5b88910(fine=0x5b88b60,coarse=0x5b97640) for gpu agent=0x5eabad0 :3:rocdevice.cpp :1611: 81067699441 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067700248 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5ee91e0 :3:rocdevice.cpp :1611: 81067700432 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067701884 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5f26930 :3:rocdevice.cpp :1611: 81067702074 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067703320 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5f64010 :3:rocdevice.cpp :1611: 81067703500 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:rocdevice.cpp :207 : 81067704752 us: 30219: [tid:0x7f68d9031280] Numa selects cpu agent[6]=0x5b9bfc0(fine=0x5b9c1e0,coarse=0x5b9c960) for gpu agent=0x5fa1710 :3:rocdevice.cpp :1611: 81067704929 us: 30219: [tid:0x7f68d9031280] HMM support: 1, xnack: 0, direct host access: 0 :3:hip_context.cpp :50 : 81067706380 us: 30219: [tid:0x7f68d9031280] Direct Dispatch: 1 :3:hip_device_runtime.cpp :517 : 81067708010 us: 30219: [tid:0x7f68d9031280] hipGetDeviceCount: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708019 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c2e0, 0 ) :3:hip_device.cpp :348 : 81067708219 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708237 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c5f8, 1 ) :3:hip_device.cpp :348 : 81067708254 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708258 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5c910, 2 ) :3:hip_device.cpp :348 : 81067708286 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708298 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5cc28, 3 ) :3:hip_device.cpp :348 : 81067708312 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708316 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5cf40, 4 ) :3:hip_device.cpp :348 : 81067708329 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708333 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d258, 5 ) :3:hip_device.cpp :348 : 81067708356 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708367 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d570, 6 ) :3:hip_device.cpp :348 : 81067708380 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device.cpp :346 : 81067708385 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties ( 0x3e5d888, 7 ) :3:hip_device.cpp :348 : 81067708395 us: 30219: [tid:0x7f68d9031280] hipGetDeviceProperties: Returned hipSuccess : :3:hip_device_runtime.cpp :530 : 81067708403 us: 30219: [tid:0x7f68d9031280] hipSetDevice ( 0 ) :3:hip_device_runtime.cpp :535 : 81067708424 us: 30219: [tid:0x7f68d9031280] hipSetDevice: Returned hipSuccess : :3:hip_memory.cpp :493 : 81067708445 us: 30219: [tid:0x7f68d9031280] hipMalloc ( 0x7fff288c3f20, 8448 ) :3:rocdevice.cpp :2093: 81067708474 us: 30219: [tid:0x7f68d9031280] device=0x653dda0, freeMem_ = 0xfeffdf00 :3:hip_memory.cpp :495 : 81067708478 us: 30219: [tid:0x7f68d9031280] hipMalloc: Returned hipSuccess : 0x7f6051b00000: duration: 33 us :3:hip_memory.cpp :1225: 81067708487 us: 30219: [tid:0x7f68d9031280] hipMemcpyAsync ( 0x7f6051b00000, 0x7fff288c40c0, 256, hipMemcpyDefault, stream:<null> ) :3:rocdevice.cpp :2686: 81067708503 us: 30219: [tid:0x7f68d9031280] number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4 :3:rocdevice.cpp :2757: 81067721343 us: 30219: [tid:0x7f68d9031280] created hardware queue 0x7f68680ca000 with size 4096 with priority 1, cooperative: 0 :3:devprogram.cpp :2675: 81067924077 us: 30219: [tid:0x7f68d9031280] Using Code Object V4. :3:devprogram.cpp :2978: 81067925217 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillImage :3:devprogram.cpp :2978: 81067925223 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned2D :3:devprogram.cpp :2978: 81067925225 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned :3:devprogram.cpp :2978: 81067925227 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImage1DA :3:devprogram.cpp :2978: 81067925228 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferAligned :3:devprogram.cpp :2978: 81067925229 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWait :3:devprogram.cpp :2978: 81067925230 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBuffer :3:devprogram.cpp :2978: 81067925232 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWrite :3:devprogram.cpp :2978: 81067925233 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRectAligned :3:devprogram.cpp :2978: 81067925234 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_gwsInit :3:devprogram.cpp :2978: 81067925236 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRect :3:devprogram.cpp :2978: 81067925237 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImageToBuffer :3:devprogram.cpp :2978: 81067925238 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferToImage :3:devprogram.cpp :2978: 81067925239 us: 30219: [tid:0x7f68d9031280] For Init/Fini: Kernel Name: __amd_rocclr_copyImage :3:rocvirtual.hpp :62 : 81067925542 us: 30219: [tid:0x7f68d9031280] Host active wait for Signal = (0x7f686811d180) for 100000 ns :3:rocvirtual.cpp :143 : 81067925558 us: 30219: [tid:0x7f68d9031280] Signal = (0x7f686811d180), start = 81067925545769, end = 81067925547369 :3:hip_memory.cpp :1226: 81067925567 us: 30219: [tid:0x7f68d9031280] hipMemcpyAsync: Returned hipSuccess : : duration: 217080 us :3:hip_stream.cpp :450 : 81067925582 us: 30219: [tid:0x7f68d9031280] hipStreamSynchronize ( stream:<null> ) :3:rocdevice.cpp :2636: 81067925599 us: 30219: [tid:0x7f68d9031280] No HW event :3:hip_stream.cpp :451 : 81067925601 us: 30219: [tid:0x7f68d9031280] hipStreamSynchronize: Returned hipSuccess : :3:hip_memory.cpp :2461: 81067925613 us: 30219: [tid:0x7f68d9031280] hipMemset ( 0x7f6051b00100, 0, 8192 ) :3:rocvirtual.cpp :679 : 81067925626 us: 30219: [tid:0x7f68d9031280] Arg3: ulong* bufULong = ptr:0x7f6051b00000 obj:[0x7f6051b00000-0x7f6051b02100] :3:rocvirtual.cpp :679 : 81067925628 us: 30219: [tid:0x7f68d9031280] Arg4: uchar* pattern = ptr:0x7f686807c080 obj:[0x7f686807c000-0x7f686807d000] :3:rocvirtual.cpp :753 : 81067925630 us: 30219: [tid:0x7f68d9031280] Arg5: uint patternSize = val:1 :3:rocvirtual.cpp :753 : 81067925631 us: 30219: [tid:0x7f68d9031280] Arg6: ulong offset = val:32 :3:rocvirtual.cpp :753 : 81067925633 us: 30219: [tid:0x7f68d9031280] Arg7: ulong size = val:1024 :3:rocvirtual.cpp :2723: 81067925634 us: 30219: [tid:0x7f68d9031280] ShaderName : __amd_rocclr_fillBufferAligned :3:rocvirtual.hpp :62 : 81067935725 us: 30219: [tid:0x7f68d9031280] Host active wait for Signal = (0x7f686811d080) for -1 ns # hangs here forever
The text was updated successfully, but these errors were encountered:
On 472e96a
Sorry, something went wrong.
No hang w/ OMNITRACE_PAPI_EVENTS, but it doesn't show in the trace either.
Hi @skyreflectedinmirrors, is this ticket still relevant? Thanks!
No branches or pull requests
Using:
hangs on the first kernel call:
The text was updated successfully, but these errors were encountered: