You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Evaluate the performance penalty of the random selection of helicity and color. This is related to issue #607 and a useful chat with @zeniheisser.
The random selection of helicity and color (#402 and #403) is needed for the functionality of producing LHE files, but is certainly expected to slow down the calculations.
First, this is particularly true for vectorised code, as the random selection there is presently done by looping through the elements of SIMD vectors individually, and it is very difficult to imagine doing this faster while still maintaining some vectorization - this is intrinsecally stochastic branching. See for instance
In this respect, note in partcular that some recent SIMD tests with the latest code (which includes the random choice of helicity and color) seem to indicate SIMD speedups (on gcc) below the x8 and x16 reached with previous code (where this random selection was not there). See for instance the openlab talk https://indico.cern.ch/event/1225408/contributions/5243830/
Second, even on GPUs, some stochastic branching is expected that would degrade branch efficiency below 100%.
In practice, in this issue, I would suggest doing a few tests such as
then, compare code performance with a random selection and with the selection bypassed: recreate the SIMD plots above and see if the SIMD speedup is slower when the random selection is not bypassed, but also compare absolute throughputs on CPU and GPU in the two cases
Evaluate the performance penalty of the random selection of helicity and color. This is related to issue #607 and a useful chat with @zeniheisser.
The random selection of helicity and color (#402 and #403) is needed for the functionality of producing LHE files, but is certainly expected to slow down the calculations.
First, this is particularly true for vectorised code, as the random selection there is presently done by looping through the elements of SIMD vectors individually, and it is very difficult to imagine doing this faster while still maintaining some vectorization - this is intrinsecally stochastic branching. See for instance
madgraph4gpu/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/CPPProcess.cc
Line 3120 in 93bee87
In this respect, note in partcular that some recent SIMD tests with the latest code (which includes the random choice of helicity and color) seem to indicate SIMD speedups (on gcc) below the x8 and x16 reached with previous code (where this random selection was not there). See for instance the openlab talk https://indico.cern.ch/event/1225408/contributions/5243830/

Second, even on GPUs, some stochastic branching is expected that would degrade branch efficiency below 100%.
In practice, in this issue, I would suggest doing a few tests such as
The text was updated successfully, but these errors were encountered: