Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

add error handling in fbridgecreate (fcheck.exe crashes if simd mode is not supported) #520

Open
valassi opened this issue Aug 29, 2022 · 0 comments

Comments

@valassi
Copy link
Member

valassi commented Aug 29, 2022

fcheck.exe crashes if simd mode is not supported

if i try to run throughputX.sh removing the sanity checks, check.exe fails gently while fcheck.exe crashes

runExe /data/avalassi/gpu2021/madgraph4gpu/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/build.512z_d_inl0_hrd0/check.exe -p 64 256 1 OMP=
ERROR! The application is built for skylake-avx512 (AVX512VL) but the host does not support it
           652,423      cycles:u                  #    0.175 GHz                      (2.62%)
           172,400      instructions:u            #    0.26  insn per cycle           (24.62%)
       0.005887559 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 1266) (512y:   60) (512z: 9903)
-------------------------------------------------------------------------
cmpExe /data/avalassi/gpu2021/madgraph4gpu/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/build.512z_d_inl0_hrd0/check.exe --common -p 2 64 2
cmpExe /data/avalassi/gpu2021/madgraph4gpu/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/build.512z_d_inl0_hrd0/fcheck.exe 2 64 2

Program received signal SIGILL: Illegal instruction.

Backtrace for this error:
#0  0x7fc518dac3ff in ???
#1  0x7fc519b20eb4 in ???
#2  0x7fc519b01e62 in ???
#3  0x7fc519b0c8a3 in ???
#4  0x40415e in ???
#5  0x40440a in ???
#6  0x7fc518d98554 in ???
#7  0x403adf in ???
#8  0xffffffffffffffff in ???
Avg ME (C++/C++)    = 
Avg ME (F77/C++)    = 
ERROR! Fortran calculation (F77/C++)  crashed

we should add gentle termination also in fcheck, and eventually in madevent (not yet clear how we do the bridging of different avxs, fo rthe moment madevent would crash in the same way!)

we should probably

  • add "if( !MatrixElementKernelHost::hostSupportsSIMD() )" in the Bridge constructor, and throw if this fails
  • catch an exception whenever we call the bridge constructor, eg in fbridge_create, and return 0 for bridge pointer in that case
  • check in driver.f and fcheck.f that fbridgecreate returned something >0, otherwise fail gently
@valassi valassi changed the title add eerror handling in fbridgecreate (fcheck.exe crashes if simd mode is not supported) add error handling in fbridgecreate (fcheck.exe crashes if simd mode is not supported) Aug 29, 2022
valassi added a commit to valassi/madgraph4gpu that referenced this issue Aug 29, 2022
valassi added a commit to valassi/madgraph4gpu that referenced this issue Aug 29, 2022
Revert "[bmk] try to remove the hardware checks in throughputX.sh - not a good idea (madgraph5#520), will revert"
This reverts commit 6edb773.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant