CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) #731

valassi · 2023-07-21T13:26:27Z

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build)

My MR #723 has another issue that went undetected by most tests (and which I only found out in building HRDCOD=1 pp_tt_W to test #701 manually): non-SM tests also fail HRDCOD=1 tests. This is probably a minor issue. But it went undetected, so the tests should be made stronger/wider.

We should add heft_gg_h (or another non-SM process with HRDCOD=1) to tput tests and to CI tests, see now #732.

valassi · 2023-07-21T13:27:58Z

See 840a81a for one of the last commits of #723

Now launching fails with a new build error (in cuda)
HRDCOD=1 tlau/lauX.sh -CPP nobm_pp_ttW

            ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -fPIC -c -x cu Parameters_sm_no_b_mass.cc -o Parameters_sm_no_b_mass_cu.o
            In file included from Parameters_sm_no_b_mass.cc:15:
            Parameters_sm_no_b_mass.h:26:2: error: #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (https://github.com/madgraph5/madgraph4gpu/issues/439): please run "make HRDCOD=1"
               26 | #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (https://github.com/madgraph5/madgraph4gpu/issues/439): please run "make HRDCOD=1"
                  |  ^~~~~

Since I want to use CPP only, I retry disabling also CUDA:

valassi · 2023-07-21T13:28:49Z

The same error can be simply detected in heft

ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -fPIC -c -x cu Parameters_heft.cc -o Parameters_heft_cu.o
In file included from Parameters_heft.cc:15:
Parameters_heft.h:26:2: error: #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
   26 | #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
      |  ^~~~~

valassi · 2023-07-21T13:33:58Z

In upstream/master this was

ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/g++  -O3  -std=c++17 -I.  -fPIC -Wall -Wshadow -Wextra -ffast-math  -fopenmp -march=skylake-avx512 -mprefer-vector-width=256  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -DMGONGPU_HARDCODE_PARAM -c Parameters_heft.cc -o Parameters_heft.o

In MR #723 this was

ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -fPIC -c -x cu Parameters_heft.cc -o Parameters_heft_cu.o
In file included from Parameters_heft.cc:15:
Parameters_heft.h:26:2: error: #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
   26 | #error This non-SM physics process only supports MGONGPU_HARDCODE_PARAM builds (#439): please run "make HRDCOD=1"
      |  ^~~~~

WELL. The difference is that the namespace MR has introduced the build of this file separately for CUDA. This was not built separately for CUDA before.

…esses towards src - this fixes HRDCOD=1 builds on non-SM processes madgraph5#731

…da of non-SM) to CODEGEN from heft_gg_h.sa

…cted from madgraph5#730 and madgraph5#731

…and madgraph5#731

…madgraph5#730 and madgraph5#731 This completes the fpe and namespace patches, addressing madgraph5#701 and madgraph5#725, respectively. Unfortunately, I tested that this patch only fixes the IEEE_DIVIDE_BY_ZERO part of madgraph5#701, but there are still other issues remaining (being debugged in branch nobm and in madgraph5#733): IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

(this is the merge of fpe as of commit 3658f3f, before fixing madgraph5#730 and madgraph5#731)

… fpe with the fixes for madgraph5#730 and madgraph5#731 Now the CUDA build of nobm_pp_ttW works - but the SIMD execution still fails with three FPEs madgraph5#733 HRDCOD=1 tlau/lauX.sh -CPP nobm_pp_ttW.mad INFO: Running Survey Creating Jobs Working on SubProcesses INFO: P1_gu_ttxwpd INFO: Building madevent in madevent_interface.py with 'CPP' matrix elements INFO: P1_gd_ttxwmu Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

… builds in cuda of non-SM) to CODEGEN from heft_gg_h.sa

valassi mentioned this issue Jul 21, 2023

Add heft_gg_h (or another non-SM process with HRDCOD=1) to tput tests and to CI tests #732

Closed

valassi changed the title ~~Add heft_gg_h (or another non-SM process with HRDCOD=1) to tput tests and to CI tests~~ CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) Jul 21, 2023

valassi self-assigned this Jul 21, 2023

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023

[namespace/fpe] in ggtt.sa makefiles, add 'export CUFLAGS' in SubProc…

324581d

…esses towards src - this fixes HRDCOD=1 builds on non-SM processes madgraph5#731

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023

[namespace/fpe] backport fix for madgraph5#731 (HRDCOD=1 builds in cu…

a1d5983

…da of non-SM) to CODEGEN from heft_gg_h.sa

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023

[fpe] regenerate gg_tt and heft_gg_h sa - all ok, differences as expe…

66b8cfe

…cted from madgraph5#730 and madgraph5#731

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023

[fpe] regenerate the other 5 processes sa with fixes for madgraph5#730 …

838e59a

…and madgraph5#731

valassi linked a pull request Jul 21, 2023 that will close this issue

Fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault #723

Merged

valassi closed this as completed in #723 Jul 21, 2023

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jul 21, 2023

Merge branch 'fpe' into nobm

b1ad04e

(this is the merge of fpe as of commit 3658f3f, before fixing madgraph5#730 and madgraph5#731)

This was referenced Jul 21, 2023

Fixes in xxxxx for IEEE_DIVIDE_BY_ZERO FPE; separate cpu/gpu namespaces and fix runtest segfault #723

Merged

Clean up interaction of Subprocess and src Makefile (export, override etc) #414

Open

valassi added a commit to mg5amcnlo/mg5amcnlo_cudacpp that referenced this issue Aug 16, 2023

[namespace/fpe] backport fix for madgraph5/madgraph4gpu#731 (HRDCOD=1…

d110a2e

… builds in cuda of non-SM) to CODEGEN from heft_gg_h.sa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) #731

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) #731

valassi commented Jul 21, 2023 •

edited

Loading

valassi commented Jul 21, 2023 •

edited

Loading

valassi commented Jul 21, 2023

valassi commented Jul 21, 2023

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) #731

CUDA builds of Parameters.cc get the wrong build flags (eg they fail the HRDCOD=1 build) #731

Comments

valassi commented Jul 21, 2023 • edited Loading

valassi commented Jul 21, 2023 • edited Loading

valassi commented Jul 21, 2023

valassi commented Jul 21, 2023

valassi commented Jul 21, 2023 •

edited

Loading

valassi commented Jul 21, 2023 •

edited

Loading