Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

madeventfortran vs madeventcpp vs madeventcuda #674

Open
valassi opened this issue Jun 2, 2023 · 11 comments
Open

madeventfortran vs madeventcpp vs madeventcuda #674

valassi opened this issue Jun 2, 2023 · 11 comments

Comments

@valassi
Copy link
Member

valassi commented Jun 2, 2023

Hi @roiser (and @oliviermattelaer) this is one part of #658, with WIP by Stefan in MR #620

There are rather different parts in the #658 "remove non standard features in madevent". What I want to discuss here is: how does the launch machinery choose whether to build/run fortran, cpp or cuda MEs.

Stefan has some good points in MR #620.

For instance
237620c

diff --git a/epochX/cudacpp/gg_tt.mad/SubProcesses/makefile b/epochX/cudacpp/gg_tt.mad/SubProcesses/makefile
... 
+.PHONY: madeventfortran madeventcuda madeventcpp
+
+madeventfortran: $(PROG)
+
+madeventcpp: $(CUDACPP_BUILDDIR)/c$(PROG)_cudacpp
+       rm -f $(CUDACPP_BUILDDIR)/$(PROG)
+       cd $(CUDACPP_BUILDDIR); ln -s c$(PROG)_cudacpp madevent
+
+madeventcuda: $(CUDACPP_BUILDDIR)/c$(PROG)_cudacpp
+       rm -f $(CUDACPP_BUILDDIR)/$(PROG)
+       cd $(CUDACPP_BUILDDIR); ln -s g$(PROG)_cudacpp madevent
+

(Two suggestions above: fix in the lst dependency should be g; keep a copy of madevent fortran anyway?)

And then
15b5974
eb1bcb1

git diff upstream/master epochX/cudacpp/gg_tt.mad/bin/internal/madevent_interface.py
______________________________________________________________________________
git diff /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/bin/internal/madevent_interface.py
...
<                     self.compile(['madevent'], cwd=Pdir)
<                     
---
> 
>                     exec_mode = 0
>                     if 'exec_mode' in self.run_card:
>                         exec_mode = self.run_card['exec_mode']
> 
>                     if exec_mode == '0':
>                         self.compile(['madeventfortran'], cwd=Pdir)
>                     elif exec_mode == '1':
>                         self.compile(['madeventcpp'], cwd=Pdir)
>                     elif exec_mode == '2':
>                         self.compile(['madeventcuda'], cwd=Pdir)
>                     else:
>                         self.compile(['all'], cwd=Pdir)
> 

If I understand well, the strategy here is:

  1. runtime: do not modify the scripts that expect "madevent" as an executable name
  2. build time: do modify the build scripts, so that the "madevent" executable is the fortran, cpp or cuda version depending on a choice embedded in the run card
    @roiser, I have not tried this out but I assume this is the general idea right?

To me, this looks very nice :-)

I mean, we need to make a choice whether the "switch" between implementations is at runtime or at build time. It is easier if it is only in one of the two. Having it at runtime is handy, but in production you want the option to ONLY build what you need (or at least, aim for that). So definitely it is better to have the logic at build time - and it looks quite simple here.

I just have a few questions/suggestions. This is mainly/only about the build in point2 (as point 1 essentially is a noop)

  • I would choose better names for the executables (it was me who suggested the first hoorible ones, I take it back): I propose madevent_MEfortran, madevent_MEcpp and madevent_MEcuda or something similar. That is to say: replace cmadevent_cudacpp by madevent_MEcpp for instance.
  • As implied in the point above, I would always keep a copy of the fortran madevent version if it exists. So, instead if just deleting the fortran madevent when replacing it by a link for the cpp or cuda version, keep a copy as madevent_MEfortran.
  • One big question (for @oliviermattelaer) is whether these changes should be ALWAYS there (so in the templates, to end up also in the production fortran version that does not invoke cudacpp) or ONLY when we generate these through cudacpp.
  • Another big question, which is related, is how we imagine handling sycl, or direct hip, in the future. We would probably need some similar madevent_MEsycl (same for cpp and gpu? hm not sure) or madevent_MEhip. Therefore I wonder if it is better/possible to modify Stefan's run inc to use labels rather than numbers: not 0 for fortran, 1 for cpp and 2 for cuda, but 'fortran', 'cpp', 'cuda' explicitly? Then it would become natural to extend it to sycl or hip etc - AND it may look nicer, more flexible and less intrusive in the generic template (the goal is to write that piece of code WITHOUT mentioning cuda cpp etc, just generic labels?)
  • A third big question is how to handle the different AVX implementations in cpp. I think Stefan's first prototype has a fixed AVX? Or relies on cudacpp to choose what it considers the best AVX? I would definitely like to have an option to build ALL simd versions, in one go, for performance tests. So maybe we can aim to have madevent_MEfortran, madevent_MEcuda, madevent_MEcppnone, madevent_cpp512z?
  • Related to all of the above, I need to understand how to build all backends but then choose a specific one for the launch. Is this something we need or not? OR can we generate ONCE, then use a different runcard for each backen, and type a launch for each backend, which will result in a different build and a different run? Not sure if this is handled in Stefan's patch yet.

Voila... any thoughts? I am sure there may also be other questions I am forgetting. I think it's better to agree on these goals first.

Thanks
Andrea

@roiser
Copy link
Member

roiser commented Jun 2, 2023

If I understand well, the strategy here is:

1. runtime: do not modify the scripts that expect "madevent" as an executable name

absolutely, I wanted to be least intrusive as possible.

2. build time: do modify the build scripts, so that the "madevent" executable is the fortran, cpp or cuda version depending on a choice embedded in the run card
   @roiser, I have not tried this out but I assume this is the general idea right?

yes this was the idea, it works for e.g. generating the gridpacks for the vectorCPU and Cuda versions

I mean, we need to make a choice whether the "switch" between implementations is at runtime or at build time. It is easier if it is only in one of the two. Having it at runtime is handy, but in production you want the option to ONLY build what you need (or at least, aim for that). So definitely it is better to have the logic at build time - and it looks quite simple here.

that was also my reasoning behind those changes

I just have a few questions/suggestions. This is mainly/only about the build in point2 (as point 1 essentially is a noop)

* I would choose better names for the executables (it was me who suggested the first hoorible ones, I take it back): I propose `madevent_MEfortran`, `madevent_MEcpp` and `madevent_MEcuda` or something similar. That is to say: replace `cmadevent_cudacpp` by `madevent_MEcpp` for instance.

yes good idea, I just also wanted to be least intrusive against the current cuda/cpp version, please go ahead, it will also make it more usable / understandable for people outside ;-). I understand why you want to include the "ME" in the name, but in view of the software evolving further and including more stuff on the cpp/cuda side one may consider removing ME from the executable names?

* As implied in the point above, I would always keep a copy of the fortran madevent version if it exists. So, instead if just deleting the fortran madevent when replacing it by a link for the cpp or cuda version, keep a copy as `madevent_MEfortran`.

yes I think its a good idea

* Another big question, which is related, is how we imagine handling sycl, or direct hip, in the future. We would probably need some similar `madevent_MEsycl` (same for cpp and gpu? hm not sure) or `madevent_MEhip`. Therefore I wonder if it is better/possible to modify Stefan's run inc to use labels rather than numbers: not 0 for fortran, 1 for cpp and 2 for cuda, but 'fortran', 'cpp', 'cuda' explicitly? Then it would become natural to extend it to sycl or hip etc - AND it may look nicer, more flexible and less intrusive in the generic template (the goal is to write that piece of code WITHOUT mentioning cuda cpp etc, just generic labels?)

if that is possible yes, I was just following what I thought are conventions in the run cards. You may want to check, IIRC in some cases the values of the run card options are converted to numbers in the python code, but not all. You may want to check my diffs, I explicitly convert to a number in the python.

* A third big question is how to handle the different AVX implementations in cpp. I think Stefan's first prototype has a fixed AVX? Or relies on cudacpp to choose what it considers the best AVX? I would definitely like to have an option to build ALL simd versions, in one go, for performance tests. So maybe we can aim to have `madevent_MEfortran`, `madevent_MEcuda`, `madevent_MEcppnone`, `madevent_cpp512z`?

My proposal relies on the "Makefile intelligence" to build the correct vector size version, building all of them is a good idea (if we want to build also cuda/cpp/fortran in parallel).

* Related to all of the above, I need to understand how to build all backends but then choose a specific one for the launch. Is this something we need or not? OR can we generate ONCE, then use a different runcard for each backen, and type a launch for each backend, which will result in a different build and a different run? Not sure if this is handled in Stefan's patch yet.

I just wanted to extend the previous question into this direction ;-). My initial idea was to keep the "madevent" (via the symlink) fixed for a given architecture/language/vector size. Thinking about it again and with all the directions our discussion is taking above (i.e. build all possible versions) I am wondering if we need a little layer on top before executing which finds out the capabilities of the platform and sets the symlink madevent->binary version before running it?

@valassi
Copy link
Member Author

valassi commented Jun 2, 2023

Hi Stefan, thanks :-)
Looks like we are in sync on most points (including where we do not know yet what's best!).

Ok I will remove the ME from names, easier.

About AVX "intelligence", that intelligence (that I put there) is a bit limited! Also, we do need to be able to build the "not best" (according to that intelligence), to make comparisons. At least I want to make those comparisons. Will think about it a bit. So we need a bit more than you implemented so far.

The last point is the most delicate - because essentially it is about destroying assumption "1" that we just use madevent and that's it. One option is to have different build directories (this would also depend a bit on #502, trying to separate the .f from the .o files in different directories, but that would be a big change). But I am just wornderin if it is not better/cleaner (even if it has a LOT of file overhead) to just treat cuda, fortran, avx's builds in separate source directories. Not yet sure, I need to have a look.

In case, @oliviermattelaer: would you agree if I try to go in the direction of #502, and separate the build products in different subdirectories, also for the Fortran files?

I'll see what I can do ... I need to do some tests/prototyping to understand what makes sense

@valassi
Copy link
Member Author

valassi commented Jun 2, 2023

PS Just to dump it explicitly on the table: one issue (in favor of #502 or of having separate source directories - and against using a single .o build of fortran files for all cases) is the usual vecsize_memmax problem. It is true that I added thi shorrible hack of vecsize_memmax vs vecsize_used, and in principle we can always use memax=16k for both cuda and cpp, even if in cuda we have used=16k and in cpp we have used=8 for simd. But I think this is inefficient, probably. For c++ where we only need 8-32 events in parallel, it is really cumbersome to have arrays dimensioned for 16k events, which are unused.

In my ideal world maybe it is like this?

  • you can do the code generation once (so you do specificy a vecsize when you type output madevent vector cudacpp)... because the code generation for complex processes may be very long, and you do NOT want to do it once for fortran, once for cuda, once for every avx etc
  • then before the build you can set the vecsize_memamx in the runcards, so each build gets arrays of the right size (NB this makes my whole rant of vecsize_used useless, because it is natural to have used=memmax, but having the functionality does not harm): this however implies that you MUST have different builds (and one different ource file, vector.inc) for the various cases... so you just cannot have driver.o (and many other .o files) shared for cuda and cpp
  • then in the runtime you use that specific build

So again I see only two options, moving .o files away from the .f files in a separate directory (at least for the fortran code that depends on vector.inc), or otherwise duplicating the source code, you have gg_tt.mad_cuda with vector inc 16k and gg_tt.mad_cpp with vector inc 32, or similar...

Am I missing other options?

PS anyway, maybe this is lower priority: for the very first release we can have memmax=16k in all cases, an dthen cpp will be a bit less efiicient, using too much memory and maybe a bit slower than it could be

@valassi
Copy link
Member Author

valassi commented Jun 2, 2023

I am taking a different route to this. I started working on the removal of patchMad.sh #656 in MR #675

The idea is that the modified Fortran makefiles will now (still, but permanently) belong to the cudacpp plugin. So it becomes kind of less important how 'polluted' the makefile becomes with cudacpp stuff, because the changes will ONLY be for cudacpp and not in general for unmodified Fortran.

@valassi
Copy link
Member Author

valassi commented Jun 4, 2023

(One tiny related issue here in build cleanups is #680 - better avoid building CUDA in each C++ AVX mode)

@roiser
Copy link
Member

roiser commented Jun 6, 2023

I am taking a different route to this. I started working on the removal of patchMad.sh #656 in MR #675

The idea is that the modified Fortran makefiles will now (still, but permanently) belong to the cudacpp plugin. So it becomes kind of less important how 'polluted' the makefile becomes with cudacpp stuff, because the changes will ONLY be for cudacpp and not in general for unmodified Fortran

Do we have a mechanism to "rebase" on top of the proper madgraph repo?

@roiser
Copy link
Member

roiser commented Jun 6, 2023

Another comment, you may not like it ;-). Those ideas about separate build directories are already integral part of CMake. Maybe we shall consider the move before going down the route with GNU Makefiles?

@valassi
Copy link
Member Author

valassi commented Jun 6, 2023

Another comment, you may not like it ;-). Those ideas about separate build directories are already integral part of CMake. Maybe we shall consider the move before going down the route with GNU Makefiles?

Hi, boh cmake I am always open to discuss eventually :-)

However:

  • for the cudacpp part, build directories are already implemented with a custom cudacpp.mk and all is fine, so no need for cmake here
  • the real issue is in the Fortran Makefile, where .f give .o in the same directory, and it is this part that I suggest fixing... I would find it easier to do it in Makefiles at least initially, unless Olivier and colleagues agree to move ALL of madgraph to cmake

Now, moving ALL of Madgraph to cmake is something I would even like at some point. But it's a lot of work, and demands a complete rewriting of the build infrastrcuture there, and requires all of the core team to agree with it. So I would say, to be discussed...

@valassi
Copy link
Member Author

valassi commented Jun 6, 2023

I am taking a different route to this. I started working on the removal of patchMad.sh #656 in MR #675
The idea is that the modified Fortran makefiles will now (still, but permanently) belong to the cudacpp plugin. So it becomes kind of less important how 'polluted' the makefile becomes with cudacpp stuff, because the changes will ONLY be for cudacpp and not in general for unmodified Fortran

Do we have a mechanism to "rebase" on top of the proper madgraph repo?

My idea here is to include the original file from the madgraph repo in the plugin, and write a simple script that creates the equivalent of patch.P1 and patch.common. So

  • when the plugin runs, it checks if the unpatched files are those kept in the plugin or not
  • if the plugin realises that it is using the latest mg5amcnlo, it goes on ok
  • if instead it realises that it is not, it gives and error, and tells you that you must update the plugin... but also contains internally the tools to do that, easily

To be rediscussed later on, but this is definitely the fastest way I see. No need to bother how much we are modifying intrusively the common madgraph for everyone: we just modify the files we need with complete freedom. This is the same freedome as we have done so far with patch.P1 and patch.common, BUT in addition things work out of the box in the plugin without a patchMd.sh a posteriori

@roiser
Copy link
Member

roiser commented Jun 6, 2023

Ok, IIRC we can fix the plugin to only work with a given (set of) version(s) of madgraph? That may give some additional security in that case.

@valassi
Copy link
Member Author

valassi commented Jun 6, 2023

Ok, IIRC we can fix the plugin to only work with a given (set of) version(s) of madgraph? That may give some additional security in that case.

Ah I had not thought of that yet. I am focusing on getting one version ready now. Anyway, it will HAVE to use the latest gpucpp branch. So the experiments are forced to upgrade to that whether they like it or not. We need and they need the vector interface in fortran to have any cpu vectorization or gpu...

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants