Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

klas2 (SIMD CPU) + epoch1/epoch2 #152

Closed
wants to merge 314 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
314 commits
Select commit Hold shift + click to select a range
ef3dc34
Streamline the header. Add march to both Makefiles. Add another AVX51…
valassi Dec 7, 2020
567308f
Stick to AVX2 as default: not slower than AVX512, allows same random …
valassi Dec 7, 2020
887e977
Test SSE. This uses xmm registers, not ymm. But goes just as fast!
valassi Dec 7, 2020
0c7480a
Test without any -march flag. Is this without any vectorization? Thro…
valassi Dec 7, 2020
764c6c7
Remove alignas from vector type, not sure it was needed (did not see …
valassi Dec 7, 2020
6449a12
Remove all vectorization if AVX2/AVX512 are not defined. Throughout 4…
valassi Dec 8, 2020
fc7c4fd
Change compilation flags adding -mavx2: throughput almost x4 higher!
valassi Dec 8, 2020
5282c92
Change compilation flags adding -mavx2: throughput almost x4 higher!
valassi Dec 8, 2020
020894e
Merge branch 'klas' of https://gitlab.cern.ch:8443/valassi/madgraph4g…
valassi Dec 8, 2020
3f91707
On my Skylake, -march=core-avx2 is better than -mavx2. I get back 2.0…
valassi Dec 8, 2020
76fc2b8
Add printout SCALAR/AVX2/AVX512F. Test again SCALAR, ME 5.4E5
valassi Dec 8, 2020
7bde798
Test AVX512F, MEs 2.01E6
valassi Dec 8, 2020
898cdfc
Slightly different AVX512F, same MEs 2.01E6
valassi Dec 8, 2020
ba3f4c4
Slightly different AVX512F, a bit better 2.06E6 (remove -mprefer-vect…
valassi Dec 8, 2020
59e36b3
Go back to AVX2 default, MEs 2.08E6 (fastest, even slightly faster th…
valassi Dec 8, 2020
7541db7
Using FLOAT and SCALAR, I get 1.85E5 (was 5.4E5 with doubles, much fa…
valassi Dec 8, 2020
327b5c3
Move to short int for helicities (fixes issue #95, scalar int multipl…
valassi Dec 8, 2020
f1a14b2
Try AVX2 with floats. Results ok. Throughput up to 3.1E6!
valassi Dec 8, 2020
3128493
MEs all 0 with float AVX512. Bug somewhere...
valassi Dec 8, 2020
9a262bb
Improve AVX512/AVX2 ifdefs
valassi Dec 8, 2020
89be4a4
Fix helicity calculation as suggested long ago by Olivier: != last, n…
valassi Dec 8, 2020
8102462
BUG FIX in the helicity calculation: must try at least neppV events
valassi Dec 8, 2020
e09d92b
Back to the default DOUBLE AVX2 (fixed all issues with FLOAT).
valassi Dec 8, 2020
c65008e
Try AVX512 with 256 width as suggested by @sponce. Still worse than AVX2
valassi Dec 8, 2020
bd1c217
Revert "Try AVX512 with 256 width as suggested by @sponce. Still wors…
valassi Dec 8, 2020
6bb9708
Move ncolor, prepare to fix jamp
valassi Dec 8, 2020
51b9681
Move ncolors further up
valassi Dec 8, 2020
2d276bf
Vectorize also jamp. Almost no benefit
valassi Dec 8, 2020
3f11fa4
Move from C++11 to C++17. This is needed eventually for arrays of fpt…
valassi Dec 8, 2020
142cf50
Move the definition of fptype_sv and cxtype_sv from CPPProcess.cc to …
valassi Dec 8, 2020
83eea64
Move mgOnGpuVectors.h to ../../src
valassi Dec 8, 2020
a2f8cf9
Move momenta to AOSOA using fptype_v for the final array.
valassi Dec 8, 2020
635fd3c
Merge remote-tracking branch 'upstream/master' into klas
valassi Dec 8, 2020
d1134c1
Remove "alignas" from cxtype_v, not needed
valassi Dec 9, 2020
300e0d6
Merge remote-tracking branch 'upstream/master' into klas
valassi Dec 9, 2020
592aba5
Fix indentation of printout
valassi Dec 9, 2020
cdfd2bc
Fix the previous issue: move variables inside the OMP loop.
valassi Dec 9, 2020
891bf7b
Revert "Fix the previous issue: move variables inside the OMP loop."
valassi Dec 9, 2020
5e90bc8
Better fix: declare private variables in the OMP parallel for.
valassi Dec 9, 2020
037a34f
Merge remote-tracking branch 'upstream/master' into klas
valassi Dec 10, 2020
3e61fb8
Merge remote-tracking branch 'upstream/master' into klas
valassi Dec 10, 2020
6d297d1
Reverse order of headers as agreed with Stephan
valassi Dec 11, 2020
1561988
Fix runTest for scalar CPU build.
valassi Dec 11, 2020
062a4f4
Bug fix - resync rambo header with .cc (not used in CUDA anyway)
valassi Dec 11, 2020
d4bd686
Partial claeanup of fptype_sv usage in code (especially scalar code)
valassi Dec 11, 2020
27971c8
Passing by value is faster than by reference (?!).
valassi Dec 11, 2020
a0f92cd
Settle for return by value, performance looks good.
valassi Dec 11, 2020
f9e0509
Cleanup, remove dead comments
valassi Dec 11, 2020
ae5b29e
Simplify the code using _sv types
valassi Dec 11, 2020
5ac688f
Invert order of headers as suggested by Stephan
valassi Dec 11, 2020
8b1e97d
Simplify the ipar mapping function - avoid multidimensional arrays
valassi Dec 11, 2020
f6ee866
NEW: use hardcoded neppR=8 for physics reproducibility. Improve print…
valassi Dec 11, 2020
8c0c1c5
Merge remote-tracking branch 'upstream/master' into klas
valassi Dec 11, 2020
ce451e2
Support simultaneous builds with avx2, avx512, no AVX, in different b…
valassi Dec 12, 2020
4f35c24
Bug fix (in latest merge?): add npagV to OMP. Also reorder alphabetic…
valassi Mar 18, 2021
c67244b
Merge remote-tracking branch 'upstream/master' into klas
valassi Mar 18, 2021
5465bd3
Fix a build error in tests - go back to Stephan's grouped targets
valassi Mar 18, 2021
b7a228a
Build only the C++ if CUDA_HOME is invalid (hack to allow C++ only bu…
valassi Mar 18, 2021
a577c0a
Fix debug builds
valassi Mar 18, 2021
1a410e3
Better fix to build only C++ if CUDA_HOME is invalid
valassi Mar 18, 2021
0bff57d
Fix C++-only build (the GTESTLIBS dependency must be brought forward)
valassi Mar 18, 2021
ff31789
BUG FIX in C++ printout of momenta.
valassi Mar 19, 2021
978526b
bug fix in c++ test in runTest.cc: add getGoodHel and setGoodHel in c++
valassi Mar 19, 2021
d2a9bb0
Update the reference file after changing neppR=4 to neppR=8
valassi Mar 19, 2021
5223fa5
Allow epoch2 (just like epoch1) to set an invalid CUDA_HOME to force …
valassi Mar 19, 2021
a192241
Hardcode neppR=8 also in epoch2, as in epoch1
valassi Mar 19, 2021
5d558e6
Final (?) fix for the klas/SIMD PR: increase tolerance from 5.E-12 to…
valassi Mar 19, 2021
58d092e
Minimal changes to add clang support (assume CXX points to .../bin/cl…
valassi Dec 10, 2020
3df764d
Comment unused variable (clang complains)
valassi Dec 10, 2020
fe2da96
Issues with [] not being a ref for clang
valassi Dec 10, 2020
82249e7
Exact same issues with opencl in clang (probably uses these by default)
valassi Dec 10, 2020
a8e4ad5
Encapsulate clang-only changes in an #ifdef
valassi Mar 19, 2021
72e472b
Set AVX=none for clang builds (no support yet for SIMD in this code)
valassi Mar 19, 2021
5dd6c72
Complete patch for clang with hardcoded scalar handling (no SIMD yet)
valassi Mar 19, 2021
7b09d76
Fix clang build warning
valassi Mar 19, 2021
d056ec2
Bug fix for 'make AVX=none'. Note that klas improved speedup even wit…
valassi Mar 20, 2021
2d3d370
Add 'make cleanall' to clean all AVX tags
valassi Mar 20, 2021
874f198
Ensure that 'make distclean' cleans all AVX tags for this compiler
valassi Mar 20, 2021
1b26634
Merge branch 'nocuda' into klas - this is a NOOP.
valassi Mar 20, 2021
39a2ca4
Merge branch 'klas' into klas2
valassi Mar 20, 2021
aac18af
Merge branch 'klas2' of https://gitlab.cern.ch:8443/valassi/madgraph4…
valassi Mar 20, 2021
65e05f7
Merge remote-tracking branch 'upstream/master' into klas
valassi Mar 21, 2021
ea084da
Simplify clang makefile: use -lgomp as libgomp.so is a symlink to lib…
valassi Mar 21, 2021
a7a2f02
Merge remote-tracking branch 'upstream/master' into klas
valassi Mar 21, 2021
c29f8f8
Merge branch 'klas' into klas2
valassi Mar 21, 2021
645836d
Merge branch 'klas2' of https://gitlab.cern.ch:8443/valassi/madgraph4…
valassi Mar 21, 2021
5f138ce
Fix a build error for the test on gcc10 with AVX=none
valassi Mar 21, 2021
b3bd06a
Reduce differences to present master to help debugging
valassi Mar 21, 2021
a0e44e2
Add support for SSE4.2: as expected, almost a factor ~ 2x over no SIMD
valassi Mar 22, 2021
353ccbc
Thanks to @sponce use AVX512 with 256 widths: gain 10% over AVX2!
valassi Mar 22, 2021
f210916
Try to replace -march=native by -mavx512f -mavx512cd -mprefer-vector-…
valassi Mar 23, 2021
e53b385
Not better with -mavx512f -mprefer-vector-width=256
valassi Mar 23, 2021
1b5fb48
Not better with -mavx512f alone
valassi Mar 23, 2021
40124be
Replace -march=native by -march=skylake-avx512: more portable, same p…
valassi Mar 23, 2021
71131d9
Replace -march=core-avx2 by -march=haswell for AVX2: more modern gcc,…
valassi Mar 23, 2021
9c2fd2a
Move default from avx2 to avx512 (10% faster)
valassi Mar 23, 2021
9156d26
Improve comments about march
valassi Mar 23, 2021
843fa05
For avx512, add -mprefer-vector-width=256, no change in performance
valassi Mar 23, 2021
b129a39
Add a few comments about the 256 width
valassi Mar 23, 2021
996d414
Fail gently and avoid "Illegal instruction" if the host does not supp…
valassi Mar 23, 2021
6d89aab
cosmetics - Mx clean
valassi Mar 23, 2021
ecc9772
For the moment, use AVXFLAGS everywhere: eventually, use them only in…
valassi Mar 23, 2021
46f3c47
Comment out the __builtin_cpu_supports check, it cannot work as-is
valassi Mar 23, 2021
ec02236
Set AVX='avx2' if /cpu/procinfo does not support avx512f
valassi Mar 24, 2021
e3e79c5
Override AVX=avx512 user-specified if the host does not support it
valassi Mar 24, 2021
88ed90b
Fix a build warning in a system call
valassi Mar 24, 2021
36e370a
Add a target 'make avxall'
valassi Mar 24, 2021
ea4a71a
Revert "Override AVX=avx512 user-specified if the host does not suppo…
valassi Mar 24, 2021
6f4d6d6
Improve handling of AVX choices. Allow a host to build avx512 even if…
valassi Mar 24, 2021
556d3b7
Reenable the __builtin_cpu_supports check, this is better than nothing.
valassi Mar 24, 2021
07e6ba4
Silence the AVX choice
valassi Mar 24, 2021
29e4de3
Add host information to 'make info'
valassi Mar 24, 2021
6f8e184
Add 'make info' at the beginning of the github CI
valassi Mar 24, 2021
1d24d88
Add host information to 'make info' also to epoch2/eemumu
valassi Mar 25, 2021
824d841
Better organization of __builtin_cpu_supports checks
valassi Mar 25, 2021
714f074
Try a different implementation of AVX dispatching: does not build (ic…
valassi Mar 25, 2021
cadeb13
Yet another attempt at runtime CPU dispatching. Does not work as expe…
valassi Mar 25, 2021
495597e
Revert "Yet another attempt at runtime CPU dispatching. Does not work…
valassi Mar 25, 2021
624f3de
Revert "Try a different implementation of AVX dispatching: does not b…
valassi Mar 25, 2021
3e2f42f
Further improve the organization of __builtin_cpu_supports checks
valassi Mar 25, 2021
9640a20
Merge remote-tracking branch 'upstream/master' into klas
valassi Mar 30, 2021
81e7c9e
Merge remote-tracking branch 'upstream/master' into klas2
valassi Mar 30, 2021
5cde555
Merge remote-tracking branch 'upstream/master' into klas
valassi Apr 1, 2021
a93833e
Merge branch 'klas' into klas2
valassi Apr 1, 2021
7d0ad39
Merge remote-tracking branch 'upstream/master' into klas
valassi Apr 1, 2021
f965112
Merge branch 'klas' into klas2
valassi Apr 1, 2021
3efa5fa
Merge remote-tracking branch 'ghav/ep2to2ep1' into klas2ep1ep2Bis
valassi Apr 2, 2021
9b0f92d
[ep2to2ep1] CPPProcess.cc - epoch2 Comment out unused functions
valassi Apr 1, 2021
38683d9
[ep2to2ep1] check.cc Add Procee printout in both epoch1 and epoch2
valassi Apr 1, 2021
0e8332c
[ep2to2ep1] Add script throughput12.sh to check performances as the c…
valassi Apr 2, 2021
30e16d1
[ep2to2ep1] check.cc Improve Epoch/Process/Laguage printout in both e…
valassi Apr 2, 2021
8481177
Merge branch 'ep2to2ep1' into klas2ep1ep2Bis
valassi Apr 2, 2021
2e8539c
Merge branch 'ep2to2ep1' into klas2ep1ep2Bis
valassi Apr 2, 2021
df2dff8
Merge branch 'ep2to2ep1' into klas2ep1ep2Bis
valassi Apr 2, 2021
b1927db
Merge branch 'ep2to2ep1' into klas2ep1ep2Bis
valassi Apr 2, 2021
5731152
[klas2ep1ep2Bis] First time measurement for klas2 + ep2ep1
valassi Apr 2, 2021
6de083a
Merge branch 'ep2to2ep1' into klas2ep1ep2Bis
valassi Apr 2, 2021
2317a34
Merge branch 'ep2to2ep1' into klas2ep1ep2Bis
valassi Apr 2, 2021
6e23fd4
Merge branch 'ep2to2ep1' into klas2ep1ep2Bis
valassi Apr 2, 2021
abe3e1e
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 4, 2021
a0d400a
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 4, 2021
9912236
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 4, 2021
64fc041
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 5, 2021
d04b325
[klas2ep12] minor fix in getGoodHel (check on isGoodHel only needed t…
valassi Apr 5, 2021
f953dd8
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 5, 2021
f32f334
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 5, 2021
93e5928
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 5, 2021
6c1718a
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 5, 2021
6412518
[klas2] First fix for bug #136 : link MadGraphtest.o to runTest.exe.
valassi Apr 5, 2021
715ddf9
[klas2] Complete fix for bug #136 : link runTest.o to runTest.exe.
valassi Apr 5, 2021
009e9b6
Merge remote-tracking branch 'upstream/master' into klas2
valassi Apr 5, 2021
52d7c45
Merge branch 'klas2' into klas2ep12
valassi Apr 5, 2021
251bb53
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 5, 2021
f8209ca
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 5, 2021
f968966
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 5, 2021
0993684
[klas2ep12] CPPProcess.cc : epoch1 clean up after the previous merge …
valassi Apr 5, 2021
03ecdae
Merge branch 'ep2to2ep1' into klas2ep12
valassi Apr 5, 2021
d387c3d
[klas2ep12] Add operator cxtype_v& operator-= and improve previous merge
valassi Apr 5, 2021
b86640c
[klas2ep12] fix bug (OMP initialization) in the previous merge of CPP…
valassi Apr 5, 2021
681eb9a
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 6, 2021
04741df
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 6, 2021
54f5c71
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 6, 2021
2845f64
[klas2ep12] CPPProcess.cc : bug fix in a previous merge (move cHel to…
valassi Apr 6, 2021
7aeb492
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 6, 2021
33cfbce
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 6, 2021
0611e95
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12 and fi…
valassi Apr 6, 2021
d080ac5
[klas2ep12] CPPProcess.cc epoch1 vectorize opzxxx
valassi Apr 6, 2021
6d3b04e
[klas2ep12] mgOnGpuVectors.h, CPPProcess.c - replace cxmaker0(r) by c…
valassi Apr 6, 2021
8e722fa
[klas2ep12] mgOnGpuVectors.h - clean up (remove or comment out) unuse…
valassi Apr 6, 2021
7265976
[klas2ep12] mgOnGpuVectors.h - move all cxmake00 at the end (prepare …
valassi Apr 6, 2021
9b6f366
[klas2ep12] mgOnGpuVectors.h - rename cxmake00() as cxzero_sv()
valassi Apr 6, 2021
f4dca6f
[klas2ep12] CPPProcess.cc - further simplify, use the same formal cod…
valassi Apr 6, 2021
1b5a9b3
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 6, 2021
70566cc
[klas2ep12] Vectorize imzxxx/ipzxxx from the previous merge
valassi Apr 6, 2021
31be7bb
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 6, 2021
fb2d2f9
[klas2ep12] CPPProcess.cc Replace CZERO by cxzero_sv()
valassi Apr 6, 2021
f962c06
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 6, 2021
7e1e62e
[klas2ep12] Vectorize oxzxxx from the previous merge
valassi Apr 6, 2021
c6d92f3
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 6, 2021
b875f97
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 7, 2021
2654ed7
[klas2ep12] CPPProcess.cc epoch1: uncomment vectorized versions, comm…
valassi Apr 7, 2021
790bbc0
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 7, 2021
260fd96
[klas2ep12] CPPProcess.cc epoch1: VECTORIZE ALL FFV FUNCTIONS FROM SC…
valassi Apr 7, 2021
ced3a2e
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 7, 2021
1493dc5
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 7, 2021
7c250ad
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 7, 2021
a83e69c
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 7, 2021
abae909
[klas2ep12] CPPProcess.cc epoch1 : add (commented out) all non vector…
valassi Apr 7, 2021
b7a5546
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 7, 2021
5ea7e29
[klas2ep12] CPPProcess.cc epoch1 : remove the unused OLD ixxxxx, ipzx…
valassi Apr 7, 2021
89d1497
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 8, 2021
64f340a
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 8, 2021
f93558b
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 8, 2021
29a3d27
Merge remote-tracking branch 'origin/ep2to2ep1' into klas2ep12
valassi Apr 8, 2021
870b8b3
*** FINAL MERGE OF 'origin/ep2to2ep1' into klas2ep12 ***
valassi Apr 8, 2021
0340bf7
Merge remote-tracking branch 'upstream/master' into klas2ep12
valassi Apr 8, 2021
166504e
Merge remote-tracking branch 'upstream/master' into klas2ep12
valassi Apr 9, 2021
afc8b7f
[klas2ep12] Makefile bug fix: use "override" instead of ":=" to ensur…
valassi Apr 9, 2021
35b9739
[klas2ep12] Build locally by default; use build.$(AVX) only if USEBUI…
valassi Apr 9, 2021
760b058
[klas2ep12] Makefile prevent mixing of different AVX modes in the sam…
valassi Apr 9, 2021
47f26c7
[klas2ep12] Makefile improve verbosity
valassi Apr 9, 2021
bc8924b
[klas2ep12] Makefile bug fix for checktag: make dir if missing
valassi Apr 9, 2021
1b29448
[klas2ep12] throughput12.sh fix the script to always use USEBUILDDIR=1
valassi Apr 9, 2021
45337f3
[klas2ep12] Move pointer definitions from check.cc to Memory.h, adjus…
valassi Apr 11, 2021
dd71d16
[klas2ep12] bug fix - src/Makefile prevent mixing of different AVX mo…
valassi Apr 11, 2021
e1b837f
Merge remote-tracking branch 'origin/testxxx' into klas2ep12
valassi Apr 11, 2021
e637da1
Merge remote-tracking branch 'origin/testxxx' into klas2ep12
valassi Apr 11, 2021
d386623
[klas2ep12] HelAmps.h comment out xxx functions with old interface, t…
valassi Apr 11, 2021
ec4a0ed
[klas2ep12] HelAmps.h epoch1 add the four currently vectorised xxx in…
valassi Apr 11, 2021
e765b01
[klas2ep12] build a simple testxxx for imzxxx (inline templates, move…
valassi Apr 12, 2021
daf590b
Merge remote-tracking branch 'origin/testxxx' into klas2ep12
valassi Apr 12, 2021
fc8be7b
[klas2ep12] fix the vectorized testxxx for imzxxx
valassi Apr 12, 2021
f7c3849
[klas2ep12] epoch1 Makefile improve mechanism to prevent mixing AVX m…
valassi Apr 12, 2021
11955d5
[klas2ep12] add vectorized testxxx for ixzxxx, opzxxx, oxzxxx
valassi Apr 12, 2021
d21853e
[klas2ep12] epoch1 re-vectorize from scratcah all xxx functions with …
valassi Apr 12, 2021
aca5d20
Merge remote-tracking branch 'origin/testxxx' into klas2ep12
valassi Apr 13, 2021
ab6ecaa
[klas2ep12] epoch1/2 mgOnGpuConfig.h add (commented out) empty "#defi…
valassi Apr 13, 2021
8a56201
[klas2ep12] large patch: vectorize ixxxxx and add a test
valassi Apr 13, 2021
2493022
[klas2ep12] epoch1/2 throughput12.sh remove sudo before calling ncu (…
valassi Apr 13, 2021
042a361
[klas2ep12] vectorize vxxxxx and sxxxxx and add their two tests
valassi Apr 13, 2021
32f7dd5
[klas2ep12] vectorize oxxxxx and add its test - this COMPLETES the ve…
valassi Apr 13, 2021
7d4989f
[klas2ep12] testxxx.cc cleanup now that all tests are implemented and…
valassi Apr 13, 2021
96412d1
[klas2ep12] cleanup CPPProcess.cc and HelAmps.h: remove my old initia…
valassi Apr 14, 2021
a0d77a9
[klas2ep12] testxxx.cc clean up - comment out a printout
valassi Apr 14, 2021
d03f144
[klas2ep12] rambo.cc epoch2 - cometics closer to epoch1
valassi Apr 14, 2021
902879d
[klas2ep12] HelAmps.h epoch2 - cosmetics closer to epoch1
valassi Apr 14, 2021
fb43a4b
[klas2ep12] first test moving all xxx/ffv to helamps: easy! but will …
valassi Apr 14, 2021
5dadd1b
Revert "[klas2ep12] first test moving all xxx/ffv to helamps: easy! b…
valassi Apr 14, 2021
8fee59d
Merge remote-tracking branch 'origin/testxxx' into klas2ep12
valassi Apr 14, 2021
beb03c1
Merge remote-tracking branch 'origin/testxxx' into klas2ep12
valassi Apr 15, 2021
7e33982
Merge remote-tracking branch 'origin/testxxx' into klas2ep12
valassi Apr 15, 2021
f95da20
Merge remote-tracking branch 'origin/testxxx' into klas2ep12
valassi Apr 15, 2021
3d7f428
Merge remote-tracking branch 'origin/testxxx' into klas2ep12
valassi Apr 15, 2021
4d45c89
Merge remote-tracking branch 'origin/testxxx' into klas2ep12 - only c…
valassi Apr 15, 2021
7c21ddd
Merge remote-tracking branch 'upstream/master' into klas2ep12
valassi Apr 15, 2021
19a1ca0
[klas2ep12] check.cc epoch1 improve order of SIMD #ifdef checks (clan…
valassi Apr 18, 2021
0299e0c
[klas2ep12] WIP on a script to dup the simd symbols
valassi Apr 19, 2021
e9fa2da
[klas2ep12] epoch1 simdgrep.sh reorder symbols to have zmm at the end
valassi Apr 19, 2021
c53688d
[klas2ep12] epoch1 simdgrep.sh improve categorization
valassi Apr 19, 2021
9ac973f
[klas2ep12] epoch1 simdgrep.sh add totals in all.symlist
valassi Apr 19, 2021
4361f67
[klas2ep12] epoch1 simdgrep.sh finalise a summary of symbols
valassi Apr 19, 2021
4de9bcf
[klas2ep12] rename simdgrep.sh as simdSym.sh
valassi Apr 19, 2021
f8328c4
[klas2ep12] epoch1 Move simd symbol summary to a separate script
valassi Apr 19, 2021
0f90258
[klas2ep12] epoch1 optionally select only the helamps namespace in si…
valassi Apr 19, 2021
41c31b6
[klas2ep12] epoch1 add '512z' build and rename all other simd builds
valassi Apr 19, 2021
6f11bca
[klas2ep12] throughput12.sh Dump simd symbol summary in both epochs
valassi Apr 19, 2021
0ccf30c
[klas2ep12] cleanup the throughput script
valassi Apr 19, 2021
6ead063
Merge remote-tracking branch 'origin/ep12float' into klas2ep12
valassi Apr 19, 2021
61ff616
Merge remote-tracking branch 'origin/ep12float' into klas2ep12
valassi Apr 21, 2021
e8e609c
Merge remote-tracking branch 'upstream/master' into klas2ep12
valassi Apr 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 130 additions & 39 deletions epoch1/cuda/ee_mumu/SubProcesses/Makefile
Original file line number Diff line number Diff line change
@@ -1,16 +1,62 @@
LIBDIR = ../../lib
TOOLSDIR = ../../../../../tools
TESTDIR = ../../../../../test
INCFLAGS = -I. -I../../src -I$(TOOLSDIR)
MODELLIB = model_sm
OPTFLAGS = -O3 # this ends up in CUFLAGS too (should it?), cannot add -Ofast or -ffast-math here
OMPFLAGS?= -fopenmp
CXXFLAGS = $(OPTFLAGS) -std=c++11 $(INCFLAGS) $(USE_NVTX) -Wall -Wshadow -Wextra $(OMPFLAGS) $(MGONGPU_CONFIG)
CXXFLAGS = $(OPTFLAGS) -std=c++17 $(INCFLAGS) $(USE_NVTX) -Wall -Wshadow -Wextra $(OMPFLAGS) $(MGONGPU_CONFIG)
CXXFLAGS+= -ffast-math # see issue #117
###CXXFLAGS+= -Ofast # performance is not different from --fast-math
LIBFLAGS = -L$(LIBDIR) -l$(MODELLIB)
CXX ?= g++

# AVX choice (example: "make AVX=none")
ifneq ($(findstring clang++,$(CXX)),)
override AVX = none
$(warning Using AVX='$(AVX)' for clang builds)
else ifneq ($(AVX),)
###$(info Using AVX='$(AVX)' according to user input)
else ifneq ($(shell grep -c avx512vl /proc/cpuinfo),0)
override AVX = 512y
###$(info Using AVX='$(AVX)' as no user input exists)
else
override AVX = avx2
$(warning Using AVX='$(AVX)' as no user input exists and host does not support avx512vl)
endif
###$(info AVX=$(AVX))

# Set the build flags appropriate to each AVX
# [NB MGONGPU_PVW512 is needed because "-mprefer-vector-width=256" is not exposed in a macro]
# [See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96476]
ifeq ($(AVX),sse4)
override AVXFLAGS = -march=nehalem # SSE4.2 with 128 width (xmm registers)
else ifeq ($(AVX),avx2)
override AVXFLAGS = -march=haswell # AVX2 with 256 width (ymm registers)
else ifeq ($(AVX),512y)
override AVXFLAGS = -march=skylake-avx512 -mprefer-vector-width=256 # AVX512 with 256 width (ymm registers) [DEFAULT!]
else ifeq ($(AVX),512z)
override AVXFLAGS = -march=skylake-avx512 -DMGONGPU_PVW512 # AVX512 with 512 width (zmm registers)
else ifneq ($(AVX),none)
$(error Unknown AVX='$(AVX)': only 'none', 'sse4', 'avx2', '512y' and '512z' are supported)
endif

# For the moment, use AVXFLAGS everywhere: eventually, use them only in encapsulated implementations
CXXFLAGS+= $(AVXFLAGS)

# Build tag (defines target and path to the optional build directory)
override TAG = $(AVX)

# Build directory: current directory by default, or build.$(TAG) if USEBUILDDIR==1
ifeq ($(USEBUILDDIR),1)
override BUILDDIR = build.$(TAG)
override LIBDIR = ../../lib/$(BUILDDIR)
else
override BUILDDIR = .
override LIBDIR = ../../lib
endif
###$(info BUILDDIR=$(BUILDDIR))
$(info Building in BUILDDIR=$(BUILDDIR) for tag=$(TAG))

# If CUDA_HOME is not set, try to set it from the location of nvcc
ifndef CUDA_HOME
NVCC ?= $(shell which nvcc 2>/dev/null)
Expand All @@ -37,8 +83,8 @@ ifneq ($(wildcard $(CUDA_HOME)/bin/nvcc),)
###CUFLAGS+= --maxrregcount 128 # improves throughput: 7.3E8 (16384 32 12) up to 7.6E8 (65536 128 12)
###CUFLAGS+= --maxrregcount 96 # degrades throughput: 4.1E8 (16384 32 12) up to 4.5E8 (65536 128 12)
###CUFLAGS+= --maxrregcount 64 # degrades throughput: 1.7E8 (16384 32 12) flat at 1.7E8 (65536 128 12)
cu_main = gcheck.exe
cu_objects = gCPPProcess.o
cu_main = $(BUILDDIR)/gcheck.exe
cu_objects = $(BUILDDIR)/gCPPProcess.o
else
# No cuda. Switch cuda compilation off and go to common random numbers in C++
NVCC := $(warning CUDA_HOME is not set or is invalid. Export CUDA_HOME to compile with cuda)
Expand All @@ -51,97 +97,142 @@ endif

GTESTLIBDIR = $(TESTDIR)/googletest/build/lib/
GTESTLIBS = $(GTESTLIBDIR)/libgtest.a $(GTESTLIBDIR)/libgtest_main.a

MAKEDEBUG=

cxx_main=check.exe
cxx_objects=CPPProcess.o
cxx_main=$(BUILDDIR)/check.exe
cxx_objects=$(BUILDDIR)/CPPProcess.o

testmain=$(BUILDDIR)/runTest.exe

# Assuming uname is available, detect if architecture is power
UNAME_P := $(shell uname -p)
ifeq ($(UNAME_P),ppc64le)
CUFLAGS+= -Xcompiler -mno-float128
endif

all: ../../src $(cu_main) $(cxx_main) runTest.exe
all.$(TAG): $(BUILDDIR)/build.tag_$(TAG) ../../src $(cu_main) $(cxx_main) $(testmain)

override oldtags=`find $(BUILDDIR) -maxdepth 1 -name 'build.tag_*' ! -name 'build.tag_$(TAG)'`
$(BUILDDIR)/build.tag_$(TAG):
@if [ ! -d $(BUILDDIR) ]; then echo "mkdir $(BUILDDIR)"; mkdir $(BUILDDIR); fi
@if [ "$(oldtags)" != "" ]; then echo -e "Cannot build for tag=$(TAG) as old builds exist for other tags:\n$(oldtags)\nPlease run 'make clean' first or consider using 'make USEBUILDDIR=1 AVX=$(AVX)'"; exit 1; fi
@touch $(BUILDDIR)/build.tag_$(TAG)

debug: OPTFLAGS = -g -O0 -DDEBUG2
debug: CUOPTFLAGS = -G
debug: MAKEDEBUG := debug
debug: all
debug: all.$(TAG)

$(LIBDIR)/lib$(MODELLIB).a: ../../src/*.h ../../src/*.cc
$(MAKE) -C ../../src $(MAKEDEBUG)
$(MAKE) -C ../../src AVX=$(AVX) $(MAKEDEBUG)

gcheck.o: gcheck.cu *.h ../../src/*.h ../../src/*.cu
$(BUILDDIR)/gcheck.o: gcheck.cu *.h ../../src/*.h ../../src/*.cu
@if [ ! -d $(BUILDDIR) ]; then mkdir $(BUILDDIR); fi
$(NVCC) $(CPPFLAGS) $(CUFLAGS) -c $< -o $@

%.o : %.cu *.h ../../src/*.h
$(BUILDDIR)/%.o : %.cu *.h ../../src/*.h
@if [ ! -d $(BUILDDIR) ]; then mkdir $(BUILDDIR); fi
$(NVCC) $(CPPFLAGS) $(CUFLAGS) -c $< -o $@

%.o : %.cc *.h ../../src/*.h
#$(BUILDDIR)/CPPProcess.o : CPPProcess.cc *.h ../../src/*.h
# @if [ ! -d $(BUILDDIR) ]; then mkdir $(BUILDDIR); fi
# $(CXX) $(CPPFLAGS) $(CXXFLAGS) $(AVXFLAGS) $(CUINC) -c $< -o $@

$(BUILDDIR)/%.o : %.cc *.h ../../src/*.h
@if [ ! -d $(BUILDDIR) ]; then mkdir $(BUILDDIR); fi
$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(CUINC) -c $< -o $@

$(cu_main): gcheck.o $(LIBDIR)/lib$(MODELLIB).a $(cu_objects)
$(cu_main): $(BUILDDIR)/gcheck.o $(LIBDIR)/lib$(MODELLIB).a $(cu_objects)
$(NVCC) $< -o $@ $(cu_objects) $(CUARCHFLAGS) $(LIBFLAGS) $(CULIBFLAGS)

$(cxx_main): check.o $(LIBDIR)/lib$(MODELLIB).a $(cxx_objects)
$(cxx_main): $(BUILDDIR)/check.o $(LIBDIR)/lib$(MODELLIB).a $(cxx_objects)
$(CXX) $< -o $@ $(cxx_objects) $(CPPFLAGS) $(CXXFLAGS) -ldl -pthread $(LIBFLAGS) $(CULIBFLAGS)

runTest.o: $(GTESTLIBS)
runTest.exe: $(GTESTLIBS)
runTest.exe: INCFLAGS += -I$(TESTDIR)/googletest/googletest/include/
runTest.exe: INCFLAGS += -I$(TESTDIR)/include/
runTest.exe: LIBFLAGS += -L$(GTESTLIBDIR)/ -lgtest -lgtest_main
runTest.exe: runTest.o $(TESTDIR)/src/MadgraphTest.o $(TESTDIR)/include/*.h
runTest.exe: cxx_objects += runTest.o $(TESTDIR)/src/MadgraphTest.o
runTest.exe: cu_objects += runTest_cu.o
$(BUILDDIR)/runTest.o: $(GTESTLIBS)
$(testmain): $(GTESTLIBS)
$(testmain): INCFLAGS += -I$(TESTDIR)/googletest/googletest/include
$(testmain): INCFLAGS += -I$(TESTDIR)/include
$(testmain): LIBFLAGS += -L$(GTESTLIBDIR) -lgtest -lgtest_main
$(testmain): $(BUILDDIR)/runTest.o $(TESTDIR)/src/MadgraphTest.o $(TESTDIR)/include/*.h
$(testmain): cxx_objects += $(BUILDDIR)/runTest.o $(TESTDIR)/src/MadgraphTest.o
$(testmain): cu_objects += $(BUILDDIR)/runTest_cu.o
ifneq ($(findstring clang++,$(CXX)),)
runTest.exe: LIBFLAGS += -L$(patsubst %bin/clang++,%lib,$(CXX))
$(testmain): LIBFLAGS += -L$(patsubst %bin/clang++,%lib,$(CXX))
endif

testxxx.o: $(GTESTLIBS)
testxxx.o: testxxx_cc_ref.txt
runTest.exe: testxxx.o
runTest.exe: cxx_objects += testxxx.o # Comment out this line to skip the test of xxx functions
$(BUILDDIR)/testxxx.o: $(GTESTLIBS)
$(BUILDDIR)/testxxx.o: testxxx_cc_ref.txt
$(testmain): $(BUILDDIR)/testxxx.o
$(testmain): cxx_objects += $(BUILDDIR)/testxxx.o # Comment out this line to skip the test of xxx functions

ifeq ($(NVCC),)
# Link only runTest.o
runTest.exe: $(LIBDIR)/lib$(MODELLIB).a $(cxx_objects) $(GTESTLIBS)
$(testmain): $(LIBDIR)/lib$(MODELLIB).a $(cxx_objects) $(GTESTLIBS)
$(CXX) -o $@ $(cxx_objects) $(CPPFLAGS) $(CXXFLAGS) -ldl -pthread $(LIBFLAGS) $(CULIBFLAGS)
else
# Link both runTest.o and runTest_cu.o
runTest.exe runTest_cu.o &: runTest.cc $(LIBDIR)/lib$(MODELLIB).a $(cxx_objects) $(cu_objects) $(GTESTLIBS)
$(NVCC) -o runTest_cu.o -c -x cu runTest.cc $(CPPFLAGS) $(CUFLAGS)
$(testmain) $(BUILDDIR)/runTest_cu.o &: runTest.cc $(LIBDIR)/lib$(MODELLIB).a $(cxx_objects) $(cu_objects) $(GTESTLIBS)
$(NVCC) -o $(BUILDDIR)/runTest_cu.o -c -x cu runTest.cc $(CPPFLAGS) $(CUFLAGS)
$(NVCC) -o $@ $(cxx_objects) $(cu_objects) $(CPPFLAGS) $(CUFLAGS) -ldl $(LIBFLAGS) $(CULIBFLAGS) -lcuda -lgomp
endif

$(GTESTLIBS):
$(MAKE) -C $(TESTDIR)

check: runTest.exe
./runTest.exe
check: $(testmain)
$(testmain)

.PHONY: clean

clean:
make -C ../../src clean
rm -f *.o *.exe
make -C ../../src AVX=$(AVX) clean
rm -f $(BUILDDIR)/build.tag*
ifneq ($(BUILDDIR),.)
rm -rf $(BUILDDIR)
else
rm -f $(BUILDDIR)/*.o $(BUILDDIR)/*.exe
endif

avxall:
@echo
make USEBUILDDIR=1 AVX=none
@echo
make USEBUILDDIR=1 AVX=sse4
@echo
make USEBUILDDIR=1 AVX=avx2
@echo
make USEBUILDDIR=1 AVX=512y
@echo
make USEBUILDDIR=1 AVX=512z

cleanall: clean
cleanall:
@echo
make clean
@echo
make USEBUILDDIR=1 AVX=none clean; make -C ../../src USEBUILDDIR=1 AVX=none clean
@echo
make USEBUILDDIR=1 AVX=sse4 clean; make -C ../../src USEBUILDDIR=1 AVX=sse4 clean
@echo
make USEBUILDDIR=1 AVX=avx2 clean; make -C ../../src USEBUILDDIR=1 AVX=avx2 clean
@echo
make USEBUILDDIR=1 AVX=512y clean; make -C ../../src USEBUILDDIR=1 AVX=512y clean
@echo
make USEBUILDDIR=1 AVX=512z clean; make -C ../../src USEBUILDDIR=1 AVX=512z clean

distclean: clean
distclean: cleanall
make -C $(TOOLSDIR) clean
make -C $(TESTDIR) clean

memcheck: $(cu_main)
/usr/local/cuda/bin/cuda-memcheck --check-api-memory-access yes --check-deprecated-instr yes --check-device-heap yes --demangle full --language c --leak-check full --racecheck-report all --report-api-errors all --show-backtrace yes --tool memcheck --track-unused-memory yes ./gcheck.exe 2 32 2
/usr/local/cuda/bin/cuda-memcheck --check-api-memory-access yes --check-deprecated-instr yes --check-device-heap yes --demangle full --language c --leak-check full --racecheck-report all --report-api-errors all --show-backtrace yes --tool memcheck --track-unused-memory yes $(BUILDDIR)/gcheck.exe 2 32 2

perf: force
make clean && make
time ./gcheck.exe -p 16348 32 12 && date
time $(BUILDDIR)/gcheck.exe -p 16348 32 12 && date

test: force
./gcheck.exe -v 1 32 1
$(BUILDDIR)/gcheck.exe -v 1 32 1

info:
@hostname
Expand Down
41 changes: 22 additions & 19 deletions epoch1/cuda/ee_mumu/SubProcesses/Memory.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,16 @@
*/

#ifndef MEMORY_H
#define MEMORY_H
#define MEMORY_H 1

#include "mgOnGpuConfig.h"
#include "mgOnGpuTypes.h"
#include "mgOnGpuVectors.h"

#include <memory>

template<typename T = fptype>
struct CudaHstDeleter {
void operator()(T* mem) {
checkCuda( cudaFreeHost( mem ) );
}
};

#ifdef __CUDACC__

template<typename T = fptype>
struct CudaDevDeleter {
void operator()(T* mem) {
Expand All @@ -28,31 +24,38 @@ struct CudaDevDeleter {
};

template<typename T = fptype>
using unique_ptr_dev = std::unique_ptr<T, CudaDevDeleter<T>>;

template<typename T = fptype>
unique_ptr_dev<T> devMakeUnique(std::size_t N) {
std::unique_ptr<T, CudaDevDeleter<T>> devMakeUnique(std::size_t N) {
T* tmp = nullptr;
checkCuda( cudaMalloc( &tmp, N * sizeof(T) ) );
return std::unique_ptr<T, CudaDevDeleter<T>>{ tmp };
}

template<typename T = fptype>
using unique_ptr_host = std::unique_ptr<T[], CudaHstDeleter<T>>;
struct CudaHstDeleter {
void operator()(T* mem) {
checkCuda( cudaFreeHost( mem ) );
}
};

template<typename T = fptype>
unique_ptr_host<T> hstMakeUnique(std::size_t N) {
std::unique_ptr<T[], CudaHstDeleter<T>> hstMakeUnique(std::size_t N) {
T* tmp = nullptr;
checkCuda( cudaMallocHost( &tmp, N * sizeof(T) ) );
return std::unique_ptr<T[], CudaHstDeleter<T>>{ tmp };
};

#else
template<typename T = fptype>
using unique_ptr_host = std::unique_ptr<T[]>;
template<typename T = fptype>
std::unique_ptr<T[]> hstMakeUnique(std::size_t N) { return std::unique_ptr<T[]>{ new T[N] }; };
#endif

template<typename T = fptype> inline
std::unique_ptr<T[]> hstMakeUnique(std::size_t N) { return std::unique_ptr<T[]>{ new T[N]() }; };

#ifdef MGONGPU_CPPSIMD

template<> inline
std::unique_ptr<fptype_v[]> hstMakeUnique(std::size_t N) { return std::unique_ptr<fptype_v[]>{ new fptype_v[N/neppV]() }; };

#endif

#endif

#endif /* MEMORY_H */
Loading