Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Reenable OPENMP multithreading in cudacpp #577

Merged
merged 42 commits into from
Dec 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
cacd48d
[omp] in gg_tt.mad reenable OpenMP MT in cudacpp #575
valassi Dec 19, 2022
9172091
[omp] in CODEGEN backport reenabling of OMP MT #575
valassi Dec 19, 2022
912020d
[omp] regenerate ggtt mad - all stable
valassi Dec 19, 2022
8055444
[omp] regenerate the other 4 processes mad
valassi Dec 19, 2022
3348c61
[omp] regenerate all 6 processes sa
valassi Dec 19, 2022
60fb04f
[omp] in gg_tt.mad fix OMP build for CUDA_HOME=none (add -lgomp to ru…
valassi Dec 19, 2022
6ee189d
[omp] in CODEGEN backport -lgomp fix
valassi Dec 19, 2022
716f085
[omp] regenerate gg_tt.mad, check all ok
valassi Dec 19, 2022
bf2a2a5
[omp] manually copy ggt_mad cudacpp.mk to the other 4 mad and to all …
valassi Dec 19, 2022
3b886c3
[omp] in ggtt.sa fix OMP #575 when MULTICHANNEL is disabled #568
valassi Dec 19, 2022
226c482
[omp] in CODEGEN backport OMP fix with MULTICHANNEL disabled
valassi Dec 19, 2022
1520997
[omp] regenerate ggtt.sa - all is stable
valassi Dec 19, 2022
80b53a1
[omp] regenerate all 6 sa and 5 mad - complete reenabling of OMP MT #…
valassi Dec 19, 2022
34b3487
[omp] in ggtt.mad try to fix OMP with clang - stil fails, will revert
valassi Dec 19, 2022
a8be519
Revert "[omp] in ggtt.mad try to fix OMP with clang - stil fails, wil…
valassi Dec 19, 2022
e640b1d
[omp] in ggtt.mad try another fix for OMP in clang, still fails, will…
valassi Dec 19, 2022
e15602d
Revert "[omp] try another fix for OMP in clang, still fails, will rev…
valassi Dec 19, 2022
4a4d9ec
[omp] in ggtt.sa disable OMP in clang
valassi Dec 19, 2022
296b7c1
[omp] in ggtt.sa timermap.h fix an icpx build warning
valassi Dec 19, 2022
20570fd
[omp] in ggtt.mad port the timermap build warning fix from ggtt.sa
valassi Dec 19, 2022
9423f4e
[omp] in ggtt.mad port the omp build fixes in cudacpp.mk from ggtt.sa
valassi Dec 19, 2022
ea17b20
[omp] in ggtt.mad ompnumthreads.cc disable the build if _OPENMP is no…
valassi Dec 19, 2022
a7301b6
[omp] in ggtt.mad driver.f disable OMPNUMTHREADS_NOT_SET_MEANS_ONE_TH…
valassi Dec 19, 2022
a8301bf
[omp] in ggttmad makefile disable OMP for icpx in the build of Fortra…
valassi Dec 19, 2022
8ce243e
[omp] in ggttmad makefile, reenable OMP for icpx in Fortran/madevent,…
valassi Dec 19, 2022
6788acc
[omp] in CODEGEN port the omp build fixes in cudacpp.mk from ggtt.sa/mad
valassi Dec 19, 2022
42f5609
[omp] in CODEGEN ompnumthreads.cc disable the build if _OPENMP is not…
valassi Dec 19, 2022
e7a78c1
[omp] in CODEGEN backport to patch.P1 and patch.common
valassi Dec 19, 2022
ccdda17
[omp] in CODEGEN backport timermap.h with icpx fixes
valassi Dec 19, 2022
85d9b33
[omp] regenerate ggtt mad and sa, both stable
valassi Dec 19, 2022
52dc5b5
[omp] in ggttsa separate clang and Intel for openmp in cudacpp.mk
valassi Dec 19, 2022
b8af3a8
[omp] in ggtt.sa try again to add omp for icpx, dfails and will revert
valassi Dec 19, 2022
e763077
Revert "[omp] in ggtt.sa try again to add omp for icpx, dfails and wi…
valassi Dec 19, 2022
aad2eb3
[omp] in ggtt.sa partial fix for OMP on icpx in cudacpp.mk: ok withou…
valassi Dec 19, 2022
ea85f24
[omp] in CODEGEN backport the last attempts from ggtt.sa to fix omp o…
valassi Dec 19, 2022
ef2ab6e
[omp] regenerate ggttsa, all stable
valassi Dec 19, 2022
460a598
[omp] regenerate ggtt mad
valassi Dec 19, 2022
e7a3a0e
[omp] regenerate 5 mad and 6 sa - completed OMP reenabling on gcc #57…
valassi Dec 19, 2022
5bf24cb
[omp] in ggttsa disable openmp also in Apple clangy
valassi Dec 19, 2022
e58d0f6
[omp] in CODEGEN backport Apple clang fixes for omp
valassi Dec 19, 2022
61f9d01
[omp] regenerate ggtt sa, all ok stable
valassi Dec 19, 2022
23be18e
[omp] ** COMPLETE OMP ** copy cudacpp.mk from ggtt.sa to the other 5 …
valassi Dec 19, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions epochX/cudacpp/CODEGEN/MG5aMC_patches/PROD/ompnumthreads.cc
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
// Hence use 'extern "C"' to avoid name mangling by the C++ compiler
// See https://www.geeksforgeeks.org/extern-c-in-c

#ifdef _OPENMP
extern "C"
{
void ompnumthreads_not_set_means_one_thread_()
Expand All @@ -16,3 +17,4 @@ extern "C"
ompnumthreadsNotSetMeansOneThread( debuglevel ); // call the inline C++ function defined in the .h file
}
}
#endif
16 changes: 9 additions & 7 deletions epochX/cudacpp/CODEGEN/MG5aMC_patches/PROD/patch.P1
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
diff --git b/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f a/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f
index 62b656862..0ae2524b4 100644
index aa01cb976..50d82f805 100644
--- b/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f
+++ a/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f
@@ -463,23 +463,140 @@ C
Expand Down Expand Up @@ -157,11 +157,11 @@ index 62b656862..0ae2524b4 100644
END

diff --git b/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f a/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f
index 295e7de8d..19aa50965 100644
index a76de8ec5..ab38b2202 100644
--- b/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f
+++ a/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f
@@ -74,13 +74,52 @@ c common/to_colstats/ncols,ncolflow,ncolalt,ic
include 'vector.inc'
@@ -74,13 +74,54 @@ c common/to_colstats/ncols,ncolflow,ncolalt,ic
include 'vector.inc' ! needed by coupl.inc (defines VECSIZE_MEMMAX)
include 'coupl.inc'
INTEGER VECSIZE_USED
- DATA VECSIZE_USED/VECSIZE_MEMMAX/ ! can be changed at runtime
Expand All @@ -179,7 +179,9 @@ index 295e7de8d..19aa50965 100644
call cpu_time(t_before)
CUMULATED_TIMING = t_before
+
+#ifdef _OPENMP
+ CALL OMPNUMTHREADS_NOT_SET_MEANS_ONE_THREAD()
+#endif
+ CALL COUNTERS_INITIALISE()
+
+c#ifdef MG5AMC_MEEXPORTER_CUDACPP
Expand Down Expand Up @@ -214,7 +216,7 @@ index 295e7de8d..19aa50965 100644
c
c Read process number
c
@@ -135,7 +174,8 @@ c If CKKW-type matching, read IS Sudakov grid
@@ -135,7 +176,8 @@ c If CKKW-type matching, read IS Sudakov grid
exit
30 issgridfile='../'//issgridfile
if(i.eq.5)then
Expand All @@ -224,7 +226,7 @@ index 295e7de8d..19aa50965 100644
stop
endif
enddo
@@ -202,8 +242,33 @@ c call sample_result(xsec,xerr)
@@ -202,8 +244,33 @@ c call sample_result(xsec,xerr)
c write(*,*) 'Final xsec: ',xsec

rewind(lun)
Expand Down Expand Up @@ -259,7 +261,7 @@ index 295e7de8d..19aa50965 100644
end

c $B$ get_user_params $B$ ! tag for MadWeight
@@ -381,7 +446,7 @@ c
@@ -381,7 +448,7 @@ c
fopened=.false.
tempname=filename
fine=index(tempname,' ')
Expand Down
33 changes: 21 additions & 12 deletions epochX/cudacpp/CODEGEN/MG5aMC_patches/PROD/patch.common
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ index a6907622e..3c1e4fdf8 100644
+ PARAMETER (VECSIZE_MEMMAX=16384) ! NB: 16k events per GPU grid is the minimum required to fill a V100 GPU
+c PARAMETER (VECSIZE_MEMMAX=32) ! NB: workaround for out-of-memory on Juwels: 32 is enough for no-CUDA builds (issue #498)
diff --git b/epochX/cudacpp/gg_tt.mad/SubProcesses/makefile a/epochX/cudacpp/gg_tt.mad/SubProcesses/makefile
index dd709f52c..b7e084145 100644
index dd709f52c..365e9d0ed 100644
--- b/epochX/cudacpp/gg_tt.mad/SubProcesses/makefile
+++ a/epochX/cudacpp/gg_tt.mad/SubProcesses/makefile
@@ -1,6 +1,19 @@
Expand Down Expand Up @@ -75,7 +75,7 @@ index dd709f52c..b7e084145 100644

LIBS = $(LIBDIR)libbias.$(libext) $(LIBDIR)libdhelas.$(libext) $(LIBDIR)libdsample.$(libext) $(LIBDIR)libgeneric.$(libext) $(LIBDIR)libpdf.$(libext) $(LIBDIR)libmodel.$(libext) $(LIBDIR)libcernlib.$(libext) $(MADLOOP_LIB) $(LOOP_LIBS)

@@ -43,24 +75,69 @@ ifeq ($(strip $(MATRIX_HEL)),)
@@ -43,24 +75,76 @@ ifeq ($(strip $(MATRIX_HEL)),)
endif


Expand Down Expand Up @@ -103,8 +103,15 @@ index dd709f52c..b7e084145 100644
+all: $(PROG) $(CUDACPP_BUILDDIR)/c$(PROG)_cudacpp # also builds g$(PROG)_cudacpp if $(CUDACPP_CULIB) exists (#503)
+endif
+
+ifneq ($(shell $(CXX) --version | egrep '^Intel'),)
+override OMPFLAGS = -fopenmp
+LINKLIBS += -lintlc # undefined reference to `_intel_fast_memcpy'
+else
+override OMPFLAGS = -fopenmp
+endif
+
+$(PROG): $(PROCESS) $(DSIG) auto_dsig.o $(LIBS) $(MATRIX) counters.o ompnumthreads.o
+ $(FC) -o $(PROG) $(PROCESS) $(DSIG) auto_dsig.o $(MATRIX) $(LINKLIBS) $(BIASDEPENDENCIES) -fopenmp counters.o ompnumthreads.o $(LDFLAGS)
+ $(FC) -o $(PROG) $(PROCESS) $(DSIG) auto_dsig.o $(MATRIX) $(LINKLIBS) $(BIASDEPENDENCIES) $(OMPFLAGS) counters.o ompnumthreads.o $(LDFLAGS)
+
+$(LIBS): .libs
+
Expand All @@ -131,17 +138,18 @@ index dd709f52c..b7e084145 100644
+
+# Also builds g$(PROG)_cudacpp if $(CUDACPP_CULIB) exists (improved patch for cpp-only builds #503)
+$(CUDACPP_BUILDDIR)/c$(PROG)_cudacpp: $(PROCESS) $(DSIG_cudacpp) auto_dsig.o $(LIBS) $(MATRIX) counters.o ompnumthreads.o $(CUDACPP_BUILDDIR)/.cudacpplibs
+ $(FC) -o $(CUDACPP_BUILDDIR)/c$(PROG)_cudacpp $(PROCESS) $(DSIG_cudacpp) auto_dsig.o $(MATRIX) $(LINKLIBS) $(BIASDEPENDENCIES) -fopenmp counters.o ompnumthreads.o -L$(LIBDIR)/$(CUDACPP_BUILDDIR) -l$(CUDACPP_COMMONLIB) -l$(CUDACPP_CXXLIB) $(LIBFLAGSRPATH) $(LDFLAGS)
+ if [ -f $(LIBDIR)/$(CUDACPP_BUILDDIR)/lib$(CUDACPP_CULIB).* ]; then $(FC) -o $(CUDACPP_BUILDDIR)/g$(PROG)_cudacpp $(PROCESS) $(DSIG_cudacpp) auto_dsig.o $(MATRIX) $(LINKLIBS) $(BIASDEPENDENCIES) -fopenmp counters.o ompnumthreads.o -L$(LIBDIR)/$(CUDACPP_BUILDDIR) -l$(CUDACPP_COMMONLIB) -l$(CUDACPP_CULIB) $(LIBFLAGSRPATH) $(LDFLAGS); fi
+ $(FC) -o $(CUDACPP_BUILDDIR)/c$(PROG)_cudacpp $(PROCESS) $(DSIG_cudacpp) auto_dsig.o $(MATRIX) $(LINKLIBS) $(BIASDEPENDENCIES) $(OMPFLAGS) counters.o ompnumthreads.o -L$(LIBDIR)/$(CUDACPP_BUILDDIR) -l$(CUDACPP_COMMONLIB) -l$(CUDACPP_CXXLIB) $(LIBFLAGSRPATH) $(LDFLAGS)
+ if [ -f $(LIBDIR)/$(CUDACPP_BUILDDIR)/lib$(CUDACPP_CULIB).* ]; then $(FC) -o $(CUDACPP_BUILDDIR)/g$(PROG)_cudacpp $(PROCESS) $(DSIG_cudacpp) auto_dsig.o $(MATRIX) $(LINKLIBS) $(BIASDEPENDENCIES) $(OMPFLAGS) counters.o ompnumthreads.o -L$(LIBDIR)/$(CUDACPP_BUILDDIR) -l$(CUDACPP_COMMONLIB) -l$(CUDACPP_CULIB) $(LIBFLAGSRPATH) $(LDFLAGS); fi
+
+counters.o: counters.cc timer.h
+ $(CXX) -std=c++11 -Wall -Wshadow -Wextra -c $< -o $@
+
+ompnumthreads.o: ompnumthreads.cc ompnumthreads.h
+ $(CXX) -std=c++11 -Wall -Wshadow -Wextra -fopenmp -c $< -o $@
+ $(CXX) -std=c++11 -Wall -Wshadow -Wextra $(OMPFLAGS) -c $< -o $@

$(PROG)_forhel: $(PROCESS) auto_dsig.o $(LIBS) $(MATRIX_HEL)
$(FC) -o $(PROG)_forhel $(PROCESS) $(MATRIX_HEL) $(LINKLIBS) $(LDFLAGS) $(BIASDEPENDENCIES) -fopenmp
- $(FC) -o $(PROG)_forhel $(PROCESS) $(MATRIX_HEL) $(LINKLIBS) $(LDFLAGS) $(BIASDEPENDENCIES) -fopenmp
+ $(FC) -o $(PROG)_forhel $(PROCESS) $(MATRIX_HEL) $(LINKLIBS) $(LDFLAGS) $(BIASDEPENDENCIES) $(OMPFLAGS)

gensym: $(SYMMETRY) configs.inc $(LIBS)
- $(FC) -o gensym $(SYMMETRY) -L../../lib/ $(LINKLIBS) $(LDFLAGS)
Expand All @@ -151,24 +159,25 @@ index dd709f52c..b7e084145 100644
$(LIBDIR)libmodel.$(libext): ../../Cards/param_card.dat
cd ../../Source/MODEL; make

@@ -69,12 +146,15 @@ $(LIBDIR)libgeneric.$(libext): ../../Cards/run_card.dat
@@ -69,12 +153,15 @@ $(LIBDIR)libgeneric.$(libext): ../../Cards/run_card.dat

$(LIBDIR)libpdf.$(libext):
cd ../../Source/PDF; make
+endif

# Add source so that the compiler finds the DiscreteSampler module.
$(MATRIX): %.o: %.f
$(FC) $(FFLAGS) $(MATRIX_FLAG) -c $< -I../../Source/ -fopenmp
- $(FC) $(FFLAGS) $(MATRIX_FLAG) -c $< -I../../Source/ -fopenmp
+ $(FC) $(FFLAGS) $(MATRIX_FLAG) -c $< -I../../Source/ $(OMPFLAGS)
%.o: %.f
- $(FC) $(FFLAGS) -c $< -I../../Source/ -fopenmp
+ $(FC) $(FFLAGS) -c $< -I../../Source/ -fopenmp -o $@
+ $(FC) $(FFLAGS) -c $< -I../../Source/ $(OMPFLAGS) -o $@
+%_cudacpp.o: %.f
+ $(FC) $(FFLAGS) -c -DMG5AMC_MEEXPORTER_CUDACPP $< -I../../Source/ -fopenmp -o $@
+ $(FC) $(FFLAGS) -c -DMG5AMC_MEEXPORTER_CUDACPP $< -I../../Source/ $(OMPFLAGS) -o $@

# Dependencies

@@ -94,5 +174,71 @@ unwgt.o: genps.inc nexternal.inc symswap.inc cluster.inc run.inc message.inc \
@@ -94,5 +181,71 @@ unwgt.o: genps.inc nexternal.inc symswap.inc cluster.inc run.inc message.inc \
run_config.inc
initcluster.o: message.inc

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include "RamboSamplingKernels.h"
#include "RandomNumberKernels.h"
#include "epoch_process_id.h"
#include "ompnumthreads.h"
#include "timermap.h"

#include <unistd.h>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,11 +169,14 @@ endif
#=== Configure defaults and check if user-defined choices exist for OMPFLAGS, AVX, FPTYPE, HELINL, HRDCOD, RNDGEN

# Set the default OMPFLAGS choice
ifneq ($(shell $(CXX) --version | grep ^Intel),)
override OMPFLAGS = # disable OpenMP on the Intel compiler (on gcc this requires gcc>=9.3, issue #269)
ifneq ($(shell $(CXX) --version | egrep '^Intel'),)
override OMPFLAGS = # disable OpenMP MT on Intel (ok without nvcc, not ok with nvcc)
else ifneq ($(shell $(CXX) --version | egrep '^(clang|Apple clang)'),)
override OMPFLAGS = # disable OpenMP MT on clang (not ok without or with nvcc)
else
override OMPFLAGS = -fopenmp
###override OMPFLAGS = # disable OpenMP MT (default before #575)
endif
###OMPFLAGS ?= -fopenmp # TEMPORARELY DISABLE OMP (need to reassess MT)
override OMPFLAGS = # TEMPORARELY DISABLE OMP (need to reassess MT)

# Set the default AVX (vectorization) choice
ifeq ($(AVX),)
Expand Down Expand Up @@ -433,7 +436,7 @@ endif

# Avoid clang warning "overriding '-ffp-contract=fast' option with '-ffp-contract=on'" (#516)
# This patch does remove the warning, but I prefer to keep it disabled for the moment...
###ifneq ($(shell $(CXX) --version | egrep '^(clang|Intel)'),)
###ifneq ($(shell $(CXX) --version | egrep '^(clang|Apple clang|Intel)'),)
###$(BUILDDIR)/CrossSectionKernels.o: CXXFLAGS += -Wno-overriding-t-option
###ifneq ($(NVCC),)
###$(BUILDDIR)/gCrossSectionKernels.o: CUFLAGS += -Xcompiler -Wno-overriding-t-option
Expand Down Expand Up @@ -529,7 +532,7 @@ $(fcxx_main): LIBFLAGS += -L$(shell dirname $(shell $(FC) --print-file-name libg
endif
$(fcxx_main): LIBFLAGS += $(CXXLIBFLAGSRPATH) # avoid the need for LD_LIBRARY_PATH
$(fcxx_main): $(BUILDDIR)/fcheck_sa.o $(BUILDDIR)/fsampler.o $(LIBDIR)/lib$(MG5AMC_CXXLIB).so $(cxx_objects_exe)
$(CXX) -o $@ $(BUILDDIR)/fcheck_sa.o $(BUILDDIR)/fsampler.o $(LIBFLAGS) -lgfortran -L$(LIBDIR) -l$(MG5AMC_CXXLIB) $(cxx_objects_exe) $(CULIBFLAGS)
$(CXX) -o $@ $(BUILDDIR)/fcheck_sa.o $(OMPFLAGS) $(BUILDDIR)/fsampler.o $(LIBFLAGS) -lgfortran -L$(LIBDIR) -l$(MG5AMC_CXXLIB) $(cxx_objects_exe) $(CULIBFLAGS)

ifneq ($(NVCC),)
ifneq ($(shell $(CXX) --version | grep ^Intel),)
Expand Down Expand Up @@ -598,14 +601,24 @@ ifneq ($(shell $(CXX) --version | grep ^clang),)
$(testmain): LIBFLAGS += -L$(patsubst %%bin/clang++,%%lib,$(shell which $(firstword $(subst ccache ,,$(CXX))) | tail -1))
endif

ifneq ($(OMPFLAGS),)
ifneq ($(shell $(CXX) --version | egrep '^Intel'),)
###$(testmain): LIBFLAGS += -qopenmp -static-intel # see https://stackoverflow.com/questions/45909648/explicitly-link-intel-icpc-openmp
else ifneq ($(shell $(CXX) --version | egrep '^(clang|Apple clang)'),)
###$(testmain): LIBFLAGS += ??? # OpenMP on clang is not yet supported in cudacpp...
else
$(testmain): LIBFLAGS += -lgomp
endif
endif

ifeq ($(NVCC),) # link only runTest.o
$(testmain): LIBFLAGS += $(CXXLIBFLAGSRPATH) # avoid the need for LD_LIBRARY_PATH
$(testmain): $(LIBDIR)/lib$(MG5AMC_COMMONLIB).so $(cxx_objects_lib) $(cxx_objects_exe) $(GTESTLIBS)
$(CXX) -o $@ $(cxx_objects_lib) $(cxx_objects_exe) -ldl -pthread $(LIBFLAGS) $(CULIBFLAGS)
else # link both runTest.o and runTest_cu.o
$(testmain): LIBFLAGS += $(CULIBFLAGSRPATH) # avoid the need for LD_LIBRARY_PATH
$(testmain): $(LIBDIR)/lib$(MG5AMC_COMMONLIB).so $(cxx_objects_lib) $(cxx_objects_exe) $(cu_objects_lib) $(cu_objects_exe) $(GTESTLIBS)
$(NVCC) -o $@ $(cxx_objects_lib) $(cxx_objects_exe) $(cu_objects_lib) $(cu_objects_exe) -ldl $(LIBFLAGS) -lcuda -lgomp $(CULIBFLAGS)
$(NVCC) -o $@ $(cxx_objects_lib) $(cxx_objects_exe) $(cu_objects_lib) $(cu_objects_exe) -ldl $(LIBFLAGS) -lcuda $(CULIBFLAGS)
endif

# Use flock (Linux only, no Mac) to allow 'make -j' if googletest has not yet been downloaded https://stackoverflow.com/a/32666215
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,16 +96,19 @@
#else
const int npagV2 = npagV; // loop on one SIMD page (neppV events) at a time
#endif
/*
#ifdef _OPENMP
// (NB gcc9 or higher, or clang, is required)
// OMP multithreading #575 (NB: tested only with gcc11 so far)
// See https://www.openmp.org/specifications/
// - default(none): no variables are shared by default
// - shared: as the name says
// - private: give each thread its own copy, without initialising
// - firstprivate: give each thread its own copy, and initialise with value from outside
#pragma omp parallel for default( none ) shared( allmomenta, allcouplings, allMEs, channelId, allNumerators, allDenominators )
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
#pragma omp parallel for default( none ) shared( allcouplings, allDenominators, allMEs, allmomenta, allNumerators, allrndcol, allrndhel, allselcol, allselhel, cGoodHel, channelId, cNGoodHel, mgOnGpu::icolamp, MEs_ighel, npagV2 )
#else
#pragma omp parallel for default( none ) shared( allcouplings, allMEs, allmomenta, allrndcol, allrndhel, allselcol, allselhel, cGoodHel, cNGoodHel, MEs_ighel, npagV2 )
#endif
#endif // _OPENMP
*/
for( int ipagV2 = 0; ipagV2 < npagV2; ++ipagV2 )
{
// Running sum of partial amplitudes squared for event by event color selection (#402)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ namespace mgOnGpu
maxsize = std::max( maxsize, ip.first.size() );
maxsize = std::max( maxsize, totalKey.size() );
// Compute the overall total
size_t ipart = 0;
//size_t ipart = 0;
float total = 0;
//float totalBut2 = 0;
float total123 = 0;
Expand All @@ -100,7 +100,7 @@ namespace mgOnGpu
if( ip.first[0] == '2' ) total2 += ip.second;
if( ip.first[0] == '3' ) total3 += ip.second;
if( ip.first[0] == '3' && ip.first[1] == 'a' ) total3a += ip.second;
ipart++;
//ipart++;
}
// Dump individual partition timers and the overall total
if( json )
Expand Down
16 changes: 8 additions & 8 deletions epochX/cudacpp/ee_mumu.mad/CODEGEN_mad_ee_mumu_log.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ generate e+ e- > mu+ mu-
No model currently active, so we import the Standard Model
INFO: load particles
INFO: load vertices
DEBUG: model prefixing takes 0.0068399906158447266 
DEBUG: model prefixing takes 0.006861448287963867 
INFO: Restrict model sm with file models/sm/restrict_default.dat .
DEBUG: Simplifying conditional expressions 
DEBUG: remove interactions: u s w+ at order: QED=1 
Expand Down Expand Up @@ -168,7 +168,7 @@ INFO: Organizing processes into subprocess groups
INFO: Generating Helas calls for process: e+ e- > mu+ mu- WEIGHTED<=4 @1
INFO: Processing color information for process: e+ e- > mu+ mu- @1
INFO: Creating files in directory P1_ll_ll
DEBUG: process_exporter_cpp =  <PLUGIN.CUDACPP_SA_OUTPUT.model_handling.PLUGIN_OneProcessExporter object at 0x7f1a6cfcedf0> [export_v4.py at line 6126] 
DEBUG: process_exporter_cpp =  <PLUGIN.CUDACPP_SA_OUTPUT.model_handling.PLUGIN_OneProcessExporter object at 0x7efceb51bdf0> [export_v4.py at line 6126] 
INFO: Creating files in directory .
DEBUG: Entering PLUGIN_OneProcessExporter.generate_process_files [model_handling.py at line 1199] 
DEBUG: self.include_multi_channel is already defined: this is madevent+second_exporter mode [model_handling.py at line 1201] 
Expand Down Expand Up @@ -201,19 +201,19 @@ INFO: Created files CPPProcess.h and CPPProcess.cc in directory ./.
INFO: Generating Feynman diagrams for Process: e+ e- > mu+ mu- WEIGHTED<=4 @1
INFO: Finding symmetric diagrams for subprocess group ll_ll
Generated helas calls for 1 subprocesses (2 diagrams) in 0.005 s
Wrote files for 8 helas calls in 0.118 s
Wrote files for 8 helas calls in 0.117 s
ALOHA: aloha starts to compute helicity amplitudes
ALOHA: aloha creates FFV1 routines
ALOHA: aloha creates FFV2 routines
ALOHA: aloha creates FFV4 routines
ALOHA: aloha creates 3 routines in 0.240 s
ALOHA: aloha creates 3 routines in 0.241 s
DEBUG: Entering PLUGIN_ProcessExporter.convert_model (create the model) [output.py at line 181] 
ALOHA: aloha starts to compute helicity amplitudes
ALOHA: aloha creates FFV1 routines
ALOHA: aloha creates FFV2 routines
ALOHA: aloha creates FFV4 routines
ALOHA: aloha creates FFV2_4 routines
ALOHA: aloha creates 7 routines in 0.305 s
ALOHA: aloha creates 7 routines in 0.307 s
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
<class 'aloha.create_aloha.AbstractRoutine'> FFV2
Expand Down Expand Up @@ -241,6 +241,6 @@ Type "launch" to generate events from this process, or see
Run "open index.html" to see more information about this process.
quit

real 0m2.425s
user 0m2.110s
sys 0m0.293s
real 0m2.451s
user 0m2.118s
sys 0m0.303s
11 changes: 7 additions & 4 deletions epochX/cudacpp/ee_mumu.mad/SubProcesses/P1_ll_ll/CPPProcess.cc
Original file line number Diff line number Diff line change
Expand Up @@ -901,16 +901,19 @@ namespace mg5amcCpu
#else
const int npagV2 = npagV; // loop on one SIMD page (neppV events) at a time
#endif
/*
#ifdef _OPENMP
// (NB gcc9 or higher, or clang, is required)
// OMP multithreading #575 (NB: tested only with gcc11 so far)
// See https://www.openmp.org/specifications/
// - default(none): no variables are shared by default
// - shared: as the name says
// - private: give each thread its own copy, without initialising
// - firstprivate: give each thread its own copy, and initialise with value from outside
#pragma omp parallel for default( none ) shared( allmomenta, allcouplings, allMEs, channelId, allNumerators, allDenominators )
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
#pragma omp parallel for default( none ) shared( allcouplings, allDenominators, allMEs, allmomenta, allNumerators, allrndcol, allrndhel, allselcol, allselhel, cGoodHel, channelId, cNGoodHel, mgOnGpu::icolamp, MEs_ighel, npagV2 )
#else
#pragma omp parallel for default( none ) shared( allcouplings, allMEs, allmomenta, allrndcol, allrndhel, allselcol, allselhel, cGoodHel, cNGoodHel, MEs_ighel, npagV2 )
#endif
#endif // _OPENMP
*/
for( int ipagV2 = 0; ipagV2 < npagV2; ++ipagV2 )
{
// Running sum of partial amplitudes squared for event by event color selection (#402)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include "RamboSamplingKernels.h"
#include "RandomNumberKernels.h"
#include "epoch_process_id.h"
#include "ompnumthreads.h"
#include "timermap.h"

#include <unistd.h>
Expand Down
2 changes: 2 additions & 0 deletions epochX/cudacpp/ee_mumu.mad/SubProcesses/P1_ll_ll/driver.f
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,9 @@ Program DRIVER
call cpu_time(t_before)
CUMULATED_TIMING = t_before

#ifdef _OPENMP
CALL OMPNUMTHREADS_NOT_SET_MEANS_ONE_THREAD()
#endif
CALL COUNTERS_INITIALISE()

c#ifdef MG5AMC_MEEXPORTER_CUDACPP
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
// Hence use 'extern "C"' to avoid name mangling by the C++ compiler
// See https://www.geeksforgeeks.org/extern-c-in-c

#ifdef _OPENMP
extern "C"
{
void ompnumthreads_not_set_means_one_thread_()
Expand All @@ -16,3 +17,4 @@ extern "C"
ompnumthreadsNotSetMeansOneThread( debuglevel ); // call the inline C++ function defined in the .h file
}
}
#endif
Loading