Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

(WIP) HELINL=L (L for linker) helas mode: pre-compile templates into separate .o object files (using RDC for CUDA; still missing HIP) #978

Draft
wants to merge 56 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
475463b
[helas] in gg_tt.mad, proof of concept for removing template/inline F…
valassi Aug 27, 2024
6b0ba37
[helas] in gg_tt.mad and CODEGEN, add comments in MemoryAccessGs.h an…
valassi Aug 28, 2024
5d26244
[helas] in gg_tt.mad, compile HelAmps.o as a separate object file in …
valassi Aug 28, 2024
24c4fee
[helas] in gg_tt.mad, add RDC to ensure that cuda builds succeed and …
valassi Aug 28, 2024
7aef7e2
[helas] in gg_tt.mad, avoid link warnings when using RDC
valassi Aug 28, 2024
77d157c
[helas] in gg_tt.mad, clean up 'linked HelAmps' implementation: add o…
valassi Aug 28, 2024
f105b9c
[helas] in tput/teeThroughputX.sh, print out the preliminary build ti…
valassi Aug 28, 2024
5f73fbb
[helas] in tput throughputX.sh and teeThroughputX.sh, add the -inlL a…
valassi Aug 28, 2024
8fe9ba4
[helas] in tput/allTees.sh, add 18 inlL tests
valassi Aug 28, 2024
4ee2863
[helas] in gg_tt.mad, fix clang formatting
valassi Aug 28, 2024
0b259a8
[helas] in gg_tt.mad, fix inlineHel=L printout in check_sa.cc
valassi Aug 28, 2024
7fb5a25
[helas] in gg_tt.mad CPPProcess.cc and HelAmps_sm.h, move code around…
valassi Aug 28, 2024
716326c
[helas] in gg_tt.mad cudacpp.mk, build HelAmps.o and use rdc=true onl…
valassi Aug 28, 2024
bf676af
[helas] first quick tput test of ggtt including -inlL option: ok for …
valassi Aug 28, 2024
ee84d7d
[helas] in CODEGEN, complete the backport from gg_tt.mad of file temp…
valassi Aug 28, 2024
4c4198f
[helas] in CODEGEN model_handling.py, complete the backport from gg_t…
valassi Aug 28, 2024
ae7d18b
[helas] in CODEGEN model_handling.py, complete the backport from gg_t…
valassi Aug 28, 2024
a58cc9c
[helas] in gg_tt.mad, move HelAmps.cc to SubProcesses and link it in …
valassi Aug 28, 2024
64875e7
[helas] in CODEGEN and gg_tt.mad, fix HelAmps.cc in HELINL=L mode and…
valassi Aug 28, 2024
9f1cfd2
[helas] regenerate gg_tt.mad, check that all is ok (codegen for HELIN…
valassi Aug 28, 2024
5ca9d2d
[helas] regenerate all processes with support for HELINL=L
valassi Aug 28, 2024
f0a5105
[helas] in tmad madX.sh and teeMadX.sh, add -inlonly and -inlLonly op…
valassi Aug 28, 2024
348ebfd
[helas] add HelAmps.cc to all regenerated processes
valassi Aug 28, 2024
3ecf99e
[helas] aborted tput test of ggttggg with all helinl values - inl1 bu…
valassi Aug 28, 2024
de8d452
[helas] rerun the ggttggg tput test only in inl0 mode - note that the…
valassi Aug 28, 2024
93f351b
[helas] manually fix the build time in the ggttggg tput test in inl0 …
valassi Aug 28, 2024
bc89719
[helas] first run of the ggttggg tput test in inlL mode - build is a …
valassi Aug 28, 2024
125b7b4
[helas] first run of the ggttggg tmad test in inlL mode - runtime is …
valassi Aug 29, 2024
a472158
[helas] in ee_mumu.mad, replace CD_ACCESS by CI_ACCESS to fix build w…
valassi Aug 29, 2024
a2b1810
[helas] in ee_mumu.mad and CODEGEN, add missing arguments (allCOUP1, …
valassi Aug 29, 2024
4a00c9c
[helas] in CODEGEN, add pairs of helas and linker functions for CD an…
valassi Aug 29, 2024
a07b914
[helas] regenerate ee_mumu.mad, with the dual series of CD_ACCESS and…
valassi Aug 29, 2024
718a84e
[helas] regenerate all processes after fixing the two eemumu issues (…
valassi Aug 29, 2024
b1f79a8
[helas] rerun tput tests (now 120 including inlL, previously 102) on …
valassi Aug 30, 2024
1fbe87a
[helas] rerun 30 tmad tests on itscrd90 - all as expected (failures i…
valassi Aug 30, 2024
7e930eb
[helas] in tmad/madX.sh, print the DATE also at the end of the test
valassi Aug 30, 2024
f25cd7a
[helas] rerun tmad ggttggg inlL
valassi Aug 30, 2024
fb0d91a
[helas] move to CODEGEN logs from the latest upstream/master for easi…
valassi Sep 2, 2024
062527c
Merge remote-tracking branch 'upstream/master' (including new CI and …
valassi Sep 2, 2024
d8bb2ca
[helas] regenerate gg_tt.mad, check all is ok
valassi Sep 2, 2024
a9a93bb
[helas] move to upstream/master tput/tmad logs for easier merging
valassi Sep 20, 2024
716ebaf
[helas] move to upstream/master gg_tt.mad codegen log for easier merging
valassi Sep 20, 2024
8044602
Merge remote-tracking branch 'upstream/master' into helas
valassi Sep 20, 2024
fbf892e
Merge branch 'amd' (with OPTFLAGS=-O2 to fix #806) into helas
valassi Sep 20, 2024
8fce3b1
Merge branch 'amd' (go back to previous upstream/master codegen logs)…
valassi Sep 21, 2024
e6561e9
[helas] regenerate all processes after merging master and amd
valassi Sep 21, 2024
86d2393
[helas] in gg_tt.mad cudacpp.mk, restrict the '-rdc' flag (for HELINL…
valassi Sep 20, 2024
52751a7
[helas] in gg_tt.mad cudacpp.mk, add -fgpu-rdc to the CPPProcess.cc c…
valassi Sep 20, 2024
b2c8bb4
[helas] in gg_tt.mad cudacpp.mk, add -fgpu-rdc --hip-link to the chec…
valassi Sep 20, 2024
e4d9206
[helas] in gg_tt.mad cudacpp.mk, temporarely go back and try to use h…
valassi Sep 20, 2024
2fda261
[helas] in gg_tt.mad cudacpp.mk, go back to using gfortran instead of…
valassi Sep 20, 2024
f021a89
[helas] backport to CODEGEN the gg_tt.mad changes in cudacpp.mk to tr…
valassi Sep 20, 2024
be5a8da
[helas] regenerate all processes - also add to repo some missing file…
valassi Sep 25, 2024
bb93d83
Merge remote-tracking branch 'upstream/master' (use -O2 instead of -O…
valassi Sep 25, 2024
c4ec5df
[helas] move to CODEGEN logs from the latest upstream/master for easi…
valassi Oct 5, 2024
b2186b9
Merge remote-tracking branch 'upstream/master' (v1.00.00, plus AMD/v1…
valassi Oct 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

#include "MemoryAccessHelpers.h"
#include "MemoryAccessVectors.h"
#include "MemoryBuffers.h" // for HostBufferMatrixElements::isaligned
#include "MemoryBuffers.h" // for HostBufferGs::isaligned

// NB: namespaces mg5amcGpu and mg5amcCpu includes types which are defined in different ways for CPU and GPU builds (see #318 and #725)
#ifdef MGONGPUCPP_GPUIMPL
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

#include "mgOnGpuConfig.h"

#include "CPPProcess.h"
#include "CPPProcess.h" // for CPPProcess::np4 and CPPProcess::npar (NB: npar may differ in different P* subprocess directories!)
#include "MemoryAccessHelpers.h"
#include "MemoryAccessVectors.h"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -971,6 +971,8 @@ main( int argc, char** argv )
<< " [" << process.getCompiler() << "]"
#ifdef MGONGPU_INLINE_HELAMPS
<< " [inlineHel=1]"
#elif defined MGONGPU_LINKER_HELAMPS
<< " [inlineHel=L]"
#else
<< " [inlineHel=0]"
#endif
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
// Copyright (C) 2020-2024 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: A. Valassi (Aug 2024) for the MG5aMC CUDACPP plugin.
// Further modified by: A. Valassi (2024) for the MG5aMC CUDACPP plugin.

#ifdef MGONGPU_LINKER_HELAMPS

#include "HelAmps_sm.h"

// -----------------------------------------------------------------------------
// *** NB: this implementation class depends on MemoryAccessMomenta,
// *** where the AOSOA definition depends on CPPProcess::npar,
// *** which may be different in different P* subprocess directories:
// *** therefore this class is presently hosted and compiled in each P*
// -----------------------------------------------------------------------------

#include "MemoryAccessAmplitudes.h"
#include "MemoryAccessCouplings.h"
#include "MemoryAccessCouplingsFixed.h"
#include "MemoryAccessGs.h"
#include "MemoryAccessMatrixElements.h"
#include "MemoryAccessMomenta.h"
#include "MemoryAccessWavefunctions.h"

#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
#include "MemoryAccessDenominators.h"
#include "MemoryAccessNumerators.h"
#endif

#ifdef MGONGPUCPP_GPUIMPL
namespace mg5amcGpu
#else
namespace mg5amcCpu
#endif
{
//--------------------------------------------------------------------------

#ifdef MGONGPUCPP_GPUIMPL
using M_ACCESS = DeviceAccessMomenta; // non-trivial access: buffer includes all events
using E_ACCESS = DeviceAccessMatrixElements; // non-trivial access: buffer includes all events
using W_ACCESS = DeviceAccessWavefunctions; // TRIVIAL ACCESS (no kernel splitting yet): buffer for one event
using A_ACCESS = DeviceAccessAmplitudes; // TRIVIAL ACCESS (no kernel splitting yet): buffer for one event
using CD_ACCESS = DeviceAccessCouplings; // non-trivial access (dependent couplings): buffer includes all events
using CI_ACCESS = DeviceAccessCouplingsFixed; // TRIVIAL access (independent couplings): buffer for one event
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
using NUM_ACCESS = DeviceAccessNumerators; // non-trivial access: buffer includes all events
using DEN_ACCESS = DeviceAccessDenominators; // non-trivial access: buffer includes all events
#endif
#else
using namespace ::mg5amcCpu;
using M_ACCESS = HostAccessMomenta; // non-trivial access: buffer includes all events
using E_ACCESS = HostAccessMatrixElements; // non-trivial access: buffer includes all events
using W_ACCESS = HostAccessWavefunctions; // TRIVIAL ACCESS (no kernel splitting yet): buffer for one event
using A_ACCESS = HostAccessAmplitudes; // TRIVIAL ACCESS (no kernel splitting yet): buffer for one event
using CD_ACCESS = HostAccessCouplings; // non-trivial access (dependent couplings): buffer includes all events
using CI_ACCESS = HostAccessCouplingsFixed; // TRIVIAL access (independent couplings): buffer for one event
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
using NUM_ACCESS = HostAccessNumerators; // non-trivial access: buffer includes all events
using DEN_ACCESS = HostAccessDenominators; // non-trivial access: buffer includes all events
#endif
#endif

//--------------------------------------------------------------------------
%(function_definitions2)s}
#endif
Original file line number Diff line number Diff line change
Expand Up @@ -556,8 +556,11 @@ $(info HELINL='$(HELINL)')
ifeq ($(HELINL),1)
CXXFLAGS += -DMGONGPU_INLINE_HELAMPS
GPUFLAGS += -DMGONGPU_INLINE_HELAMPS
else ifeq ($(HELINL),L)
CXXFLAGS += -DMGONGPU_LINKER_HELAMPS
GPUFLAGS += -DMGONGPU_LINKER_HELAMPS
else ifneq ($(HELINL),0)
$(error Unknown HELINL='$(HELINL)': only '0' and '1' are supported)
$(error Unknown HELINL='$(HELINL)': only 'L,', '0' and '1' are supported)
endif

# Set the build flags appropriate to each HRDCOD choice (example: "make HRDCOD=1")
Expand Down Expand Up @@ -660,7 +663,6 @@ override RUNTIME =
#=== Makefile TARGETS and build rules below
#===============================================================================


ifeq ($(GPUCC),)
cxx_checkmain=$(BUILDDIR)/check_cpp.exe
cxx_fcheckmain=$(BUILDDIR)/fcheck_cpp.exe
Expand Down Expand Up @@ -789,6 +791,19 @@ gpu_objects_lib=$(BUILDDIR)/CPPProcess_$(GPUSUFFIX).o $(BUILDDIR)/MatrixElementK
gpu_objects_exe=$(BUILDDIR)/CommonRandomNumberKernel_$(GPUSUFFIX).o $(BUILDDIR)/RamboSamplingKernels_$(GPUSUFFIX).o
endif

# Add object files and special build flags only for the HELINL=L mode
ifeq ($(HELINL),L)
cxx_objects_lib+=$(BUILDDIR)/HelAmps_cpp.o
gpu_objects_lib+=$(BUILDDIR)/HelAmps_$(GPUSUFFIX).o
ifeq ($(findstring nvcc,$(GPUCC)),nvcc) # Nvidia GPU build
$(BUILDDIR)/CPPProcess_$(GPUSUFFIX).o: GPUFLAGS += -rdc true # compilation fails if this is not added (ptxas fatal: Unresolved extern function)
$(BUILDDIR)/HelAmps_$(GPUSUFFIX).o: GPUFLAGS += -rdc true # runtime fails if this is not added ('invalid device symbol' in CPPProcess.cc cHel to tHel copy)
else ifeq ($(findstring hipcc,$(GPUCC)),hipcc) # AMD GPU build
$(BUILDDIR)/CPPProcess_$(GPUSUFFIX).o: GPUFLAGS += -fgpu-rdc # compilation fails if this is not added (lld: error: undefined hidden symbol: mg5amcGpu::linker_CD_FFV1_0)
$(gpu_checkmain): LIBFLAGS += -fgpu-rdc --hip-link
endif
endif

# Target (and build rules): C++ and CUDA/HIP shared libraries
$(LIBDIR)/lib$(MG5AMC_CXXLIB).so: $(BUILDDIR)/fbridge_cpp.o
$(LIBDIR)/lib$(MG5AMC_CXXLIB).so: cxx_objects_lib += $(BUILDDIR)/fbridge_cpp.o
Expand All @@ -799,12 +814,12 @@ ifneq ($(GPUCC),)
$(LIBDIR)/lib$(MG5AMC_GPULIB).so: $(BUILDDIR)/fbridge_$(GPUSUFFIX).o
$(LIBDIR)/lib$(MG5AMC_GPULIB).so: gpu_objects_lib += $(BUILDDIR)/fbridge_$(GPUSUFFIX).o
$(LIBDIR)/lib$(MG5AMC_GPULIB).so: $(LIBDIR)/lib$(MG5AMC_COMMONLIB).so $(gpu_objects_lib)
$(GPUCC) --shared -o $@ $(gpu_objects_lib) $(GPULIBFLAGSRPATH2) -L$(LIBDIR) -l$(MG5AMC_COMMONLIB)
$(GPUCC) --shared -o $@ $(gpu_objects_lib) $(GPUARCHFLAGS) $(GPULIBFLAGSRPATH2) -L$(LIBDIR) -l$(MG5AMC_COMMONLIB)
# Bypass std::filesystem completely to ease portability on LUMI #803
#ifneq ($(findstring hipcc,$(GPUCC)),)
# $(GPUCC) --shared -o $@ $(gpu_objects_lib) $(GPULIBFLAGSRPATH2) -L$(LIBDIR) -l$(MG5AMC_COMMONLIB) -lstdc++fs
# $(GPUCC) --shared -o $@ $(gpu_objects_lib) $(GPUARCHFLAGS) $(GPULIBFLAGSRPATH2) -L$(LIBDIR) -l$(MG5AMC_COMMONLIB) -lstdc++fs
#else
# $(GPUCC) --shared -o $@ $(gpu_objects_lib) $(GPULIBFLAGSRPATH2) -L$(LIBDIR) -l$(MG5AMC_COMMONLIB)
# $(GPUCC) --shared -o $@ $(gpu_objects_lib) $(GPUARCHFLAGS) $(GPULIBFLAGSRPATH2) -L$(LIBDIR) -l$(MG5AMC_COMMONLIB)
#endif
endif

Expand Down Expand Up @@ -975,6 +990,7 @@ $(cxx_testmain): LIBFLAGS += $(CXXLIBFLAGSRPATH) # avoid the need for LD_LIBRARY
$(cxx_testmain): $(LIBDIR)/lib$(MG5AMC_COMMONLIB).so $(cxx_objects_lib) $(cxx_objects_exe) $(GTESTLIBS)
$(CXX) -o $@ $(cxx_objects_lib) $(cxx_objects_exe) -ldl -pthread $(LIBFLAGS)
else # link only runTest_$(GPUSUFFIX).o (new: in the past, this was linking both runTest_cpp.o and runTest_$(GPUSUFFIX).o)
$(gpu_testmain): LIBFLAGS += $(GPUARCHFLAGS) # avoid "nvlink warning: SM Arch not found" when using rdc
###$(gpu_testmain): LIBFLAGS += $(GPULIBFLAGSASAN)
$(gpu_testmain): LIBFLAGS += $(GPULIBFLAGSRPATH) # avoid the need for LD_LIBRARY_PATH
$(gpu_testmain): $(LIBDIR)/lib$(MG5AMC_COMMONLIB).so $(gpu_objects_lib) $(gpu_objects_exe) $(GTESTLIBS)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ ifneq ($(words $(filter $(FPTYPE), $(SUPPORTED_FPTYPES))),1)
$(error Invalid fptype FPTYPE='$(FPTYPE)': supported fptypes are $(foreach fptype,$(SUPPORTED_FPTYPES),'$(fptype)'))
endif

override SUPPORTED_HELINLS = 0 1
override SUPPORTED_HELINLS = L 0 1
ifneq ($(words $(filter $(HELINL), $(SUPPORTED_HELINLS))),1)
$(error Invalid helinl HELINL='$(HELINL)': supported helinls are $(foreach helinl,$(SUPPORTED_HELINLS),'$(helinl)'))
endif
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

// Choose if curand is supported for generating random numbers
// For HIP, by default, do not allow curand to be used (hiprand or common random numbers will be used instead)
// For both CUDA and C++, by default, do not inline, but allow this macro to be set from outside with e.g. -DMGONGPU_HAS_NO_CURAND
// For both CUDA and C++, by default, do not skip curand, but allow this macro to be set from outside with e.g. -DMGONGPU_HAS_NO_CURAND
// (there exist CUDA installations, e.g. using the HPC package, which do not include curand - see PR #784 and #785)
#if defined __HIPCC__
#define MGONGPU_HAS_NO_CURAND 1
Expand All @@ -45,7 +45,7 @@

// Choose if hiprand is supported for generating random numbers
// For CUDA, by default, do not allow hiprand to be used (curand or common random numbers will be used instead)
// For both HIP and C++, by default, do not inline, but allow this macro to be set from outside with e.g. -DMGONGPU_HAS_NO_HIPRAND
// For both HIP and C++, by default, do not skip hiprand, but allow this macro to be set from outside with e.g. -DMGONGPU_HAS_NO_HIPRAND
// (there may exist HIP installations which do not include hiprand?)
#if defined __CUDACC__ // this must be __CUDACC__ (not MGONGPUCPP_GPUIMPL)
#define MGONGPU_HAS_NO_HIPRAND 1
Expand Down Expand Up @@ -77,9 +77,16 @@
// Choose whether to inline all HelAmps functions
// This optimization can gain almost a factor 4 in C++, similar to -flto (issue #229)
// By default, do not inline, but allow this macro to be set from outside with e.g. -DMGONGPU_INLINE_HELAMPS
// (NB: MGONGPU_INLINE_HELAMPS and MGONGPU_LINKER_HELAMPS are mutually exclusive)
//#undef MGONGPU_INLINE_HELAMPS // default
////#define MGONGPU_INLINE_HELAMPS 1

// Choose whether to compile and link all HelAmps functions as separate object files
// By default, do not link, but allow this macro to be set from outside with e.g. -DMGONGPU_LINKER_HELAMPS
// (NB: MGONGPU_INLINE_HELAMPS and MGONGPU_LINKER_HELAMPS are mutually exclusive)
//#undef MGONGPU_LINKER_HELAMPS // default
////#define MGONGPU_LINKER_HELAMPS 1

// Choose whether to hardcode the cIPD physics parameters rather than reading them from user cards
// This optimization can gain 20%% in CUDA in eemumu (issue #39)
// By default, do not hardcode, but allow this macro to be set from outside with e.g. -DMGONGPU_HARDCODE_PARAM
Expand Down Expand Up @@ -156,6 +163,11 @@
#endif
#endif

// SANITY CHECKS (HelAmps)
#if defined MGONGPU_INLINE_HELAMPS and defined MGONGPU_LINKER_HELAMPS
#error You must CHOOSE (AT MOST) ONLY ONE of MGONGPU_INLINE_HELAMPS or defined MGONGPU_LINKER_HELAMPS
#endif

// NB: namespace mgOnGpu includes types which are defined in exactly the same way for CPU and GPU builds (see #318 and #725)
namespace mgOnGpu
{
Expand Down
Loading
Loading