-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Shared libraries + Bridge + Cleaner Makefiles #367
Conversation
…e shared libraries Cherry-pick commit 5df671b of 'roiser/sharedlib' (only commit in that branch) into shared (PR madgraph5#361) Fix conflicts in epochX/cudacpp/gg_tt/SubProcesses/Makefile: add OMPFLAGS and remove CXXFLAGS and CPPFLAGS when linking cxx_main
The dependency of check.exe and runTest.exe on the shared libraries is not ideal now. First problem, the executables are now not statically built so one needs an LD_LIBRARY_PATH (ok maybe that's what we want...)
Second problem, there seem to be some hardcoded paths
I would be tempted to have the libmodel embedded inside the libcxx/libcuda. This actually means replicating some symbos, but for the moment I guess we WITHER use one OR the other? Eventually if we have all of them together we should have them inside a single library?... Something to be clarified related to heterogeneous apps #318 |
Hi @valassi concerning the having one shared library or many, I kept them on purpose separately as the duplication of the symbols didn't make too much sense to me. Best |
…ocument the various parts of the Makefile
…er and document the various parts of the Makefile
…ve OMPFLAGS, AVX, FPTYPE, HELINL, HRDCOD, RNDGEN
…ut this was not used anywhere
…) - reorder and document the various parts of the Makefile
… - reorder and document the various parts of the Makefile
…of the full .so path Before the fix: ldd ../../lib/libmg5amc_cxx.so linux-vdso.so.1 => (0x00007ffda0f64000) ../../lib/libmg5amc_common.so (0x00007f01674e1000) libstdc++.so.6 => /cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/lib64/libstdc++.so.6 (0x00007f0167115000) libm.so.6 => /lib64/libm.so.6 (0x00007f0166e13000) libgcc_s.so.1 => /cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/lib64/libgcc_s.so.1 (0x00007f01674a7000) libc.so.6 => /lib64/libc.so.6 (0x00007f0166a45000) /lib64/ld-linux-x86-64.so.2 (0x00007f01672ea000) After the fix: ldd ../../lib/libmg5amc_cxx.so linux-vdso.so.1 => (0x00007ffd79dbc000) libmg5amc_common.so => not found libstdc++.so.6 => /cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/lib64/libstdc++.so.6 (0x00007f2c55a11000) libm.so.6 => /lib64/libm.so.6 (0x00007f2c5570f000) libgcc_s.so.1 => /cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/lib64/libgcc_s.so.1 (0x00007f2c55db5000) libc.so.6 => /lib64/libc.so.6 (0x00007f2c55341000) /lib64/ld-linux-x86-64.so.2 (0x00007f2c55be6000)
…tead of the full .so path Before the fix: ldd ./check.exe linux-vdso.so.1 => (0x00007fff76344000) ../../lib/libmg5amc_cxx.so (0x00007f1c71c75000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f1c71868000) libmg5amc_common.so => not found libcurand.so.10 => /usr/local/cuda-10.2/lib64/libcurand.so.10 (0x00007f1c6d7c5000) libstdc++.so.6 => /cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/lib64/libstdc++.so.6 (0x00007f1c6d5f0000) libm.so.6 => /lib64/libm.so.6 (0x00007f1c6d2ee000) libgcc_s.so.1 => /cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/lib64/libgcc_s.so.1 (0x00007f1c71c3a000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1c6d0d2000) libc.so.6 => /lib64/libc.so.6 (0x00007f1c6cd04000) /lib64/ld-linux-x86-64.so.2 (0x00007f1c71a6c000) libmg5amc_common.so => not found librt.so.1 => /lib64/librt.so.1 (0x00007f1c6cafc000) After the fix: ldd ./check.exe linux-vdso.so.1 => (0x00007ffc637c7000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fa38f2de000) libmg5amc_common.so => not found libcurand.so.10 => /usr/local/cuda-10.2/lib64/libcurand.so.10 (0x00007fa38b23b000) libmg5amc_cxx.so => not found libstdc++.so.6 => /cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/lib64/libstdc++.so.6 (0x00007fa38f50c000) libm.so.6 => /lib64/libm.so.6 (0x00007fa38af39000) libgcc_s.so.1 => /cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/lib64/libgcc_s.so.1 (0x00007fa38af1f000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa38ad03000) libc.so.6 => /lib64/libc.so.6 (0x00007fa38a935000) /lib64/ld-linux-x86-64.so.2 (0x00007fa38f4e2000) librt.so.1 => /lib64/librt.so.1 (0x00007fa38a72d000)
…) and other commands
…chk.exe explicitly
…ith that huge grid)
…eck, runGcheck (check is used in the CI historically)
…y need common.so)
…IB, fix debug build in src
…performance... BUT AN ISSUE IN CURAND!
…e and float (see madgraph5#5 and madgraph5#212)
…e it is only copied once for all AVX modes)
./tput/teeThroughputX.sh -flt -hrd -makej -makeclean -eemumu -ggtt -ggttg -ggttgg -ggttggg This took 3 hours in total including the build (from scratch at least for ggttg and ggttggg) STARTED AT Thu Feb 24 00:01:58 CET 2022 ENDED AT Thu Feb 24 02:58:09 CET 2022
…non POSIX confirmong and depends on bash as a shell https://stackoverflow.com/a/36531884 Some CI jobs were printing out messages starting with "-e"... Example https://github.com/madgraph5/madgraph4gpu/runs/5315934229?check_suite_focus=true
…en and all 5 processes) The CI was giving errors as follows https://github.com/madgraph5/madgraph4gpu/runs/5316173040?check_suite_focus=true ./check.exe: error while loading shared libraries: libmg5amc_common.so: cannot open shared object file: No such file or directory ./fcheck.exe: error while loading shared libraries: libmg5amc_common.so: cannot open shared object file: No such file or directory ./check.exe --common -p 2 32 2 ./fcheck.exe 2 32 2 Avg ME (C++/C++) = /bin/sh: 1: [: unexpected operator /bin/sh: 1: [: unexpected operator Avg ME (F77/C++) = File "<string>", line 1 me1=; me2=; reldif=abs((me2-me1)/me1); print('Relative difference =', reldif); ok = reldif <= 2E-4; print ( '%s (relative difference %s 2E-4)' % ( ('OK','<=') if ok else ('ERROR','>') ) ); import sys; sys.exit(0 if ok else 1) ^ SyntaxError: invalid syntax make: *** [Makefile:630: cmpFcheck] Error 1
…ke commands in the github CI This will avoid CI failures in float tests that would look for the default double builds. The issues has been introduced when I added a dependency of the check target on all.$(TAG)
… processes) The CI was failing on the self-hosted GPU nodes with the following errors https://github.com/madgraph5/madgraph4gpu/runs/5316634564?check_suite_focus=true ./check.exe --common -p 2 32 2 ./fcheck.exe 2 32 2 Avg ME (C++/C++) = 1.215805e-02 Avg ME (F77/C++) = 1.2158051820303455E-002 /bin/bash: python: command not found make: *** [Makefile:640: cmpFcheck] Error 127
…ge (must install python on the node!) Revert "[shared] replace 'python' by 'python3' in Makefile (codegen and all 5 processes)" This reverts commit ed9e5e8.
… all 5 processes) "yum install python39" has been executed on the CI, but I still get /bin/bash: python: command not found
…the only one with active developments (although I am not 100% certain that the jobs will be executed in this order...)
Ok all tests are finally succeeding in the CI, after fixing a few CI related issues. Amongst the latest changes in this PR in the last few days:
I am just running a few more tests to commit some logs, but otherwise this is ready to go |
./tput/teeThroughputX.sh -inlonly -flt -makej -makeclean -eemumu -ggtt -ggttgg
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -rmbhst
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -bridge
./tput/teeThroughputX.sh -eemumu -curhst ./tput/teeThroughputX.sh -eemumu -common Check all logs are updated: grep DATE tput/logs_*/*txt | sort -k2
Hi @roiser @oliviermattelaer this is now complete, I am about to merge it - sorry for the delay |
All checks have passed. Self merging. |
This is a WIP PR to follow up on
For the momente there is only one commit, ficing the conflicts in #361.