Evaluate the performance penalty of the random selection of helicity and color (for SIMD and for GPUs) #608

valassi · 2023-03-28T16:28:01Z

Evaluate the performance penalty of the random selection of helicity and color. This is related to issue #607 and a useful chat with @zeniheisser.

The random selection of helicity and color (#402 and #403) is needed for the functionality of producing LHE files, but is certainly expected to slow down the calculations.

First, this is particularly true for vectorised code, as the random selection there is presently done by looping through the elements of SIMD vectors individually, and it is very difficult to imagine doing this faster while still maintaining some vectorization - this is intrinsecally stochastic branching. See for instance

madgraph4gpu/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/CPPProcess.cc

Line 3120 in 93bee87

for( int ieppV = 0; ieppV < neppV; ++ieppV )

      for( int ieppV = 0; ieppV < neppV; ++ieppV )
      {
        const int ievt = ievt00 + ieppV;
        //printf( "sigmaKin: ievt=%4d rndhel=%f\n", ievt, allrndhel[ievt] );
        for( int ighel = 0; ighel < cNGoodHel; ighel++ )
        {
#if defined MGONGPU_CPPSIMD
          const bool okhel = allrndhel[ievt] < ( MEs_ighel[ighel][ieppV] / MEs_ighel[cNGoodHel - 1][ieppV] );
#else
          const bool okhel = allrndhel[ievt] < ( MEs_ighel[ighel] / MEs_ighel[cNGoodHel - 1] );
#endif
          if( okhel )
          {
            const int ihelF = cGoodHel[ighel] + 1; // NB Fortran [1,ncomb], cudacpp [0,ncomb-1]
            allselhel[ievt] = ihelF;
            //printf( "sigmaKin: ievt=%4d ihel=%4d\n", ievt, ihelF );
            break;
          }
        }

In this respect, note in partcular that some recent SIMD tests with the latest code (which includes the random choice of helicity and color) seem to indicate SIMD speedups (on gcc) below the x8 and x16 reached with previous code (where this random selection was not there). See for instance the openlab talk https://indico.cern.ch/event/1225408/contributions/5243830/

Second, even on GPUs, some stochastic branching is expected that would degrade branch efficiency below 100%.

In practice, in this issue, I would suggest doing a few tests such as

first, implement the option to completely bypass the random selection (see Add the option to bypass the random choice of helicity and color (eg for reweighting) #607)
then, compare code performance with a random selection and with the selection bypassed: recreate the SIMD plots above and see if the SIMD speedup is slower when the random selection is not bypassed, but also compare absolute throughputs on CPU and GPU in the two cases
also, check explicitly if GPU branch efficiency is below 100% with the random selection (see another upcoming issue - now Recompute GPU branch efficiency with a non-dummy random helicity/color selection #609)

The text was updated successfully, but these errors were encountered:

This was referenced Mar 28, 2023

Recompute GPU branch efficiency with a non-dummy random helicity/color selection #609

Open

Add the option to bypass the random choice of helicity and color (eg for reweighting) #607

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate the performance penalty of the random selection of helicity and color (for SIMD and for GPUs) #608

Evaluate the performance penalty of the random selection of helicity and color (for SIMD and for GPUs) #608

valassi commented Mar 28, 2023 •

edited

Loading

Evaluate the performance penalty of the random selection of helicity and color (for SIMD and for GPUs) #608

Evaluate the performance penalty of the random selection of helicity and color (for SIMD and for GPUs) #608

Comments

valassi commented Mar 28, 2023 • edited Loading

valassi commented Mar 28, 2023 •

edited

Loading