Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) #731

Closed
valassi opened this issue Jul 21, 2023 · 3 comments · Fixed by #723
Closed
Assignees

Comments

@valassi
Copy link
Member

valassi commented Jul 21, 2023

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build)

My MR #723 has another issue that went undetected by most tests (and which I only found out in building HRDCOD=1 pp_tt_W to test #701 manually): non-SM tests also fail HRDCOD=1 tests. This is probably a minor issue. But it went undetected, so the tests should be made stronger/wider.

We should add heft_gg_h (or another non-SM process with HRDCOD=1) to tput tests and to CI tests, see now #732.

@valassi
Copy link
Member Author

valassi commented Jul 21, 2023

See 840a81a for one of the last commits of #723

Now launching fails with a new build error (in cuda)
HRDCOD=1 tlau/lauX.sh -CPP nobm_pp_ttW

            ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -fPIC -c -x cu Parameters_sm_no_b_mass.cc -o Parameters_sm_no_b_mass_cu.o
            In file included from Parameters_sm_no_b_mass.cc:15:
            Parameters_sm_no_b_mass.h:26:2: error: #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (https://github.com/madgraph5/madgraph4gpu/issues/439): please run "make HRDCOD=1"
               26 | #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (https://github.com/madgraph5/madgraph4gpu/issues/439): please run "make HRDCOD=1"
                  |  ^~~~~

Since I want to use CPP only, I retry disabling also CUDA:

@valassi
Copy link
Member Author

valassi commented Jul 21, 2023

The same error can be simply detected in heft

ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -fPIC -c -x cu Parameters_heft.cc -o Parameters_heft_cu.o
In file included from Parameters_heft.cc:15:
Parameters_heft.h:26:2: error: #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
   26 | #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
      |  ^~~~~

@valassi
Copy link
Member Author

valassi commented Jul 21, 2023

In upstream/master this was

ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++  -O3  -std=c++17 -I.  -fPIC -Wall -Wshadow -Wextra -ffast-math  -fopenmp -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_HARDCODE_PARAM -c Parameters_heft.cc -o Parameters_heft.o

In MR #723 this was

ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -fPIC -c -x cu Parameters_heft.cc -o Parameters_heft_cu.o
In file included from Parameters_heft.cc:15:
Parameters_heft.h:26:2: error: #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
   26 | #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
      |  ^~~~~

WELL. The difference is that the namespace MR has introduced the build of this file separately for CUDA. This was not built separately for CUDA before.

@valassi valassi changed the title Add heft_gg_h (or another non-SM process with HRDCOD=1) to tput tests and to CI tests CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) Jul 21, 2023
@valassi valassi self-assigned this Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
…esses towards src - this fixes HRDCOD=1 builds on non-SM processes madgraph5#731
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
…da of non-SM) to CODEGEN from heft_gg_h.sa
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
…madgraph5#730 and madgraph5#731

This completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively.

Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701,
but there are still other issues remaining (being debugged in branch nobm and in madgraph5#733):
  IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
(this is the merge of fpe as of commit 3658f3f, before fixing madgraph5#730 and madgraph5#731)
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023
… fpe with the fixes for madgraph5#730 and madgraph5#731

Now the CUDA build of nobm_pp_ttW works - but the SIMD execution still fails with three FPEs madgraph5#733
HRDCOD=1 tlau/lauX.sh -CPP nobm_pp_ttW.mad

INFO: Running Survey
Creating Jobs
Working on SubProcesses
INFO:     P1_gu_ttxwpd
INFO: Building madevent in madevent_interface.py with 'CPP' matrix elements
INFO:     P1_gd_ttxwmu
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
valassi added a commit to mg5amcnlo/mg5amcnlo_cudacpp that referenced this issue Aug 16, 2023
… builds in cuda of non-SM) to CODEGEN from heft_gg_h.sa
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
1 participant