Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Evaluate the performance penalty of the random selection of helicity and color (for SIMD and for GPUs) #608

Open
valassi opened this issue Mar 28, 2023 · 0 comments

Comments

@valassi
Copy link
Member

valassi commented Mar 28, 2023

Evaluate the performance penalty of the random selection of helicity and color. This is related to issue #607 and a useful chat with @zeniheisser.

The random selection of helicity and color (#402 and #403) is needed for the functionality of producing LHE files, but is certainly expected to slow down the calculations.

First, this is particularly true for vectorised code, as the random selection there is presently done by looping through the elements of SIMD vectors individually, and it is very difficult to imagine doing this faster while still maintaining some vectorization - this is intrinsecally stochastic branching. See for instance

for( int ieppV = 0; ieppV < neppV; ++ieppV )

      for( int ieppV = 0; ieppV < neppV; ++ieppV )
      {
        const int ievt = ievt00 + ieppV;
        //printf( "sigmaKin: ievt=%4d rndhel=%f\n", ievt, allrndhel[ievt] );
        for( int ighel = 0; ighel < cNGoodHel; ighel++ )
        {
#if defined MGONGPU_CPPSIMD
          const bool okhel = allrndhel[ievt] < ( MEs_ighel[ighel][ieppV] / MEs_ighel[cNGoodHel - 1][ieppV] );
#else
          const bool okhel = allrndhel[ievt] < ( MEs_ighel[ighel] / MEs_ighel[cNGoodHel - 1] );
#endif
          if( okhel )
          {
            const int ihelF = cGoodHel[ighel] + 1; // NB Fortran [1,ncomb], cudacpp [0,ncomb-1]
            allselhel[ievt] = ihelF;
            //printf( "sigmaKin: ievt=%4d ihel=%4d\n", ievt, ihelF );
            break;
          }
        }

In this respect, note in partcular that some recent SIMD tests with the latest code (which includes the random choice of helicity and color) seem to indicate SIMD speedups (on gcc) below the x8 and x16 reached with previous code (where this random selection was not there). See for instance the openlab talk https://indico.cern.ch/event/1225408/contributions/5243830/
image

Second, even on GPUs, some stochastic branching is expected that would degrade branch efficiency below 100%.

In practice, in this issue, I would suggest doing a few tests such as

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant