-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Keep Bridge state across several calls? #329
Comments
@jtchilders , @nscottnichols , as I just briefly mentioned via email, I think this kind of design should also allow you to cleanly call an initialise and a finalise function for kokkos, within the fcreatebridge and fdeletebridge functions (or rather within the Bridge ctor/dtor which arecalled from those). Does this make sense or would you have other suggestions? Thanks |
This kind of bridge makes a lot of sense and works for me. |
Hi, I'm a bit wary about having 3 calls across the languages. I'd say this is our weakest point in the architecture, calling just c synbols. I had an idea on how to avoid it and woukd try it next week. Bon week-end, Stefan |
Hi, thanks @nscottnichols @jtchilders and @roiser for the feedback. Stefan, I do not see any problem with having three calls across languages. If we need initialization, computations and finalizations, that's the only way I see, or at least the cleanest. As I said in my first post, there are other ways with a singe call, but they leak resources, and in any case I find them less clean. I had not included the code, doing this now: File bridg2.cpp #include "Bridge.h"
void doStuff( Bridge*& bridge, double x ){ if ( !bridge ) bridge = new Bridge(); bridge->doStuff( x ); }
extern "C"
{
void fdostuff_( Bridge** bridge, double* x ){ doStuff( *bridge, *x ); }
} File prog2.f PROGRAM FPROG
IMPLICIT NONE
INTEGER*8 BRIDGE/0/
INTEGER ILOOP, NLOOP
DOUBLE PRECISION XLOOP
NLOOP = 5
DO ILOOP = 1, NLOOP
XLOOP = 1.1 * ILOOP
c WRITE(*,*) "XLOOP =", XLOOP
CALL FDOSTUFF( BRIDGE, XLOOP )
END DO
END PROGRAM FPROG Building and running
It works, but the Bridge is not deleted at the end. Note in any case that you still need some sort of initialisation (BRIDGE/0/) because in any case on the C++ side you must have a single instance of the relevant class(es). Again, if you want to avoid statics, allow multithreading, have a single helicity decision and a single reading from parameter file, I would go this way. Anyway, if you have an alternative try it out and let me know! |
PS @oliviermattelaer what's your take? |
@valassi concerning the calls, sorry I didn't explain myself enough. The problem I see is that we are just calling c symbols without any protection for what concerns the signature of the function, this looks very fragile to me. Therefore I would suggest to reduce these numbers of calls to a minimum (one if possible). I'll try sth and come back to you later. |
Hi Stefan, I do not think it's a good idea to avoid explicit create/delete calls, for the reasons above (mainly, resource leaks due to missing delete). I understand your concern about the missing signature check instead. I just had a quick look, fortran provides an easy way to do this via the INTERFACE statement. This is yest another version of the code (with three calls). Keep the same Bridge.h and bridge.cpp as above from the first post, then instead use these: File bridge_interface.inc INTERFACE
SUBROUTINE FCREATEBRIDGE(PBRIDGE)
INTEGER*8 PBRIDGE
END SUBROUTINE FCREATEBRIDGE
SUBROUTINE FDOSTUFF(PBRIDGE, XARG)
INTEGER*8 PBRIDGE
DOUBLE PRECISION XARG
END SUBROUTINE FDOSTUFF
SUBROUTINE FDELETEBRIDGE(PBRIDGE)
INTEGER*8 PBRIDGE
END SUBROUTINE FDELETEBRIDGE
END INTERFACE File fprog_include.f PROGRAM FPROG
IMPLICIT NONE
INCLUDE 'bridge_interface.inc'
INTEGER*8 BRIDGE
INTEGER ILOOP, NLOOP
DOUBLE PRECISION XLOOP
CALL FCREATEBRIDGE( BRIDGE )
NLOOP = 5
DO ILOOP = 1, NLOOP
XLOOP = 1.1 * ILOOP
c WRITE(*,*) "XLOOP =", XLOOP
CALL FDOSTUFF( BRIDGE, XLOOP )
END DO
CALL FDELETEBRIDGE( BRIDGE )
END PROGRAM FPROG Build and run:
Would this be better? This can even be further refined using MODULE in Fortran, but that looks like an overkill to me. Essentially what I propose is that also the bridge_interface.inc file would need to be created (and possibly tested?) from the cuda/c++. Essentially the cuda/c++ would then create BOTH a library (with the fcreatebridge, fdostuff and fdeletebridge symbols) AND an include file (declaring the interface for those symbols). The Fortran MadEvent would then simply include that inc file and link the library. Let me know... |
Hi @valassi , to me it doesn't change a lot IIUC, we can still call whatever symbol we like on the C side and it doesn't need to match what we define in fortran. As said, I have an idea and will come back to this thread hopefully soon. best Stefan |
Hi Stefan, what I proposed constrains the prog.f (so MadEvent) to use exactly the interface defined in the bridge_interface.inc, otherwise it fails File prog_include_wrong.f PROGRAM FPROG
IMPLICIT NONE
INCLUDE 'bridge_interface.inc'
INTEGER*8 BRIDGE
INTEGER ILOOP, NLOOP
c DOUBLE PRECISION XLOOP
REAL XLOOP
CALL FCREATEBRIDGE( BRIDGE )
NLOOP = 5
DO ILOOP = 1, NLOOP
XLOOP = 1.1 * ILOOP
c WRITE(*,*) "XLOOP =", XLOOP
CALL FDOSTUFF( BRIDGE, XLOOP )
END DO
CALL FDELETEBRIDGE( BRIDGE )
END PROGRAM FPROG Build fails
It does not force the bridge_interface.f to have the same interface as the C++ class however, you are right in that. This is why I suggest that also the inc file is created as part of the cuda/c++ library, it is the responsability of the library to keep the intenral consistency of the modules it offers with the Fortran interface it declares. To me that sounds enough. As I said one can start using Fortran MODULEs, but I think that might become an overkill without much added value IIUC. Anyway try out the other suggestion and let me know, no hurry! |
Yes but what I mean is that the c function signature can be anything, that's problematic to my mind, give me a bit more time ... |
Hi @valassi , here is an example on how it could be done. This should be thread safe and we would only need to call one bridge c symbol. All the initialisation/destruction can go into the constructor/destructor of the Bridge class and would be called once. It's a tiny bit safer and would keep the two worlds a bit better apart to my mind. The main() obviously will be the fortran part, but as I'm lazy keep all in one file for now. Please let me know what you think cheers
runs with
|
Hi Stefan, |
Sorry for the quick answer before, doing other things at the same time. I don't know, avoiding static is one of the things I said at the very beginning of this issue as being one goal. I agree that initially we will only have one bridge from Fortran, so we could actually have that static singleton in C++. I just find it cleaner and leaving more future flexibility to let the Fortran know it is talking to one specific object. Anyway, initially maybe we can go for your static. => What do other people think? @oliviermattelaer especially, but also @nscottnichols and @jtchilders ? Anyway, irrespective of whether we go for the static or an explicit instance, I would add a couple of comments:
What do you think? Thanks |
Hi @valassi thanks, I think we are on the same page. Just for clarification, I am worried about calling the c symbols. What this snippet above does it not to avoid the risk but it reduces it as we call only one (for now). Yes in case we need more we can add but I'm even not sure that we need so much more. I'd say we cross the bridge when we get there ;-), if we shall really need it should be easy to add more stuff. Let's see what others think about it. thanks |
I am finalising the new Bridge, as discussed. For completeness, this is the proof of concept implementation. Bridge3.h
bridge3.cpp
bridge3_interface.inc
prog3_include.f
Execution
|
… to eemumu fbridge.cc (madgraph5#329)
… - build fails because of multiple definitions ccache /usr/local/cuda-11.6/bin/nvcc -o runTest.exe ./CPPProcess.o ./RandomNumberKernels.o ./RamboSamplingKernels.o ./MatrixElementKernels.o ./BridgeKernels.o ./CrossSectionKernels.o ./fbridge.o ./testxxx.o ./testmisc.o ./runTest.o ./gCPPProcess.o ./gRandomNumberKernels.o ./gRamboSamplingKernels.o ./gMatrixElementKernels.o ./gBridgeKernels.o ./gCrossSectionKernels.o ./fbridge_cu.o ./testxxx_cu.o ./testmisc_cu.o ./runTest_cu.o -ldl -L../../lib -lmg5amc_common -L../../../../../test/googletest/build/lib/ -lgtest -lgtest_main -L/usr/local/cuda-11.6/lib64/ -lcurand -lcuda -lgomp /cvmfs/sft.cern.ch/lcg/releases/binutils/2.34-990b2/x86_64-centos7/bin/ld: ./fbridge_cu.o: in function `fbridgedelete_': tmpxft_00004196_00000000-6_fbridge.cudafe1.cpp:(.text+0x10): multiple definition of `fbridgedelete_'; ./fbridge.o:fbridge.cc:(.text+0x0): first defined here /cvmfs/sft.cern.ch/lcg/releases/binutils/2.34-990b2/x86_64-centos7/bin/ld: ./fbridge_cu.o: in function `fbridgesequence_': tmpxft_00004196_00000000-6_fbridge.cudafe1.cpp:(.text+0x170): multiple definition of `fbridgesequence_'; ./fbridge.o:fbridge.cc:(.text+0x90): first defined here /cvmfs/sft.cern.ch/lcg/releases/binutils/2.34-990b2/x86_64-centos7/bin/ld: ./fbridge_cu.o: in function `fbridgecreate_': tmpxft_00004196_00000000-6_fbridge.cudafe1.cpp:(.text+0x3a0): multiple definition of `fbridgecreate_'; ./fbridge.o:fbridge.cc:(.text+0xf0): first defined here collect2: error: ld returned 1 exit status
…ph5#329) - need to fix momenta/mes pointers...
I have completed a standalone Fortran test in PR #367. This is a Fortran program that fills a Fortran momenta array using a c++ sampler (common host + rambo sequence), and then computes MEs using a c++ or cuda bridge sequence. The results are the same in this Fortran program as they are in the standalone c++/cuda program. |
Hi @roiser, @oliviermattelaer
After our chat yesterday at some point I realised there may be one issue with the current Bridge interface: we keep creating a new object
madgraph4gpu/epoch2/fortran-cuda/pp_ttx/SubProcesses/fbridge.cu
Line 30 in 774a900
madgraph4gpu/epoch2/fortran-cuda/pp_ttx/SubProcesses/bridge.h
Line 115 in 774a900
This would not leak memory because it is stack allocated, however it has two issues
Since it is a bad idea to keep static instances in the cuda/c++ module, I think that it is somehow unavoidable to pass the knowledge of the Bridge C++ pointer to Fortran. I made a couple of quick tests and this looks quite easy.
For instance, look at this code
File Bridge.h
File prog.f
File bridge.cpp
Building and running:
There are other alternatives possible to avoid the FCREATEBRIDGE call, but then essentially one leaks the Bridge at the end of the program (unless overcomplicating it). In any case I think that getting the notion of a pointer (here as INTEGER*8 for an x64 double precision memory system) is kind of unavoidable.
This approach would also have other advantages, e.g. it allows the Fortran to cleanly pass any configuration parameters to teh Bridge after construction and before the first use, if we choose to do so. For instance one may pass as a single paramater the choice whether the ME calculation is on the CPU or the GPU. Or one can pass the number of blocks and threads in the GPU grid (or eventually more exotic configurations for heterogeneous computing).
Would you agree to this type of changes? Olivier do you see a problem implementing this in the Fortran? You essentially only need to declare an extra INTEGER*8 and then be sure you have an initialization call in that Frortran routine ("if first then create the bridge" will do, even if it leaks the memory at the end, but it is good enough for a beginning).
Thanks!
Andrea
The text was updated successfully, but these errors were encountered: