Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
REMOVE GLOBAL AND SHARED code for wavefunctions (issue #7).
Keep only the "local" w[5][6] and focus on that. time ./gcheck.exe -p 65536 128 1 *************************************** NumIterations = 1 NumThreadsPerBlock = 128 NumBlocksPerGrid = 65536 --------------------------------------- FP precision = DOUBLE (nan=0) Complex type = THRUST::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Wavefunction GPU memory = LOCAL Curand generation = DEVICE (CUDA code) --------------------------------------- NumberOfEntries = 1 TotalTimeInWaveFuncs = 1.147885e-02 sec MeanTimeInWaveFuncs = 1.147885e-02 sec StdDevTimeInWaveFuncs = 0.000000e+00 sec MinTimeInWaveFuncs = 1.147885e-02 sec MaxTimeInWaveFuncs = 1.147885e-02 sec --------------------------------------- TotalEventsComputed = 8388608 RamboEventsPerSec = 8.288389e+07 sec^-1 MatrixElemEventsPerSec = 7.307881e+08 sec^-1 *************************************** NumMatrixElements(notNan) = 8388608 MeanMatrixElemValue = 1.371734e-02 GeV^0 StdErrMatrixElemValue = 2.831148e-06 GeV^0 StdDevMatrixElemValue = 8.199880e-03 GeV^0 MinMatrixElemValue = 6.071582e-03 GeV^0 MaxMatrixElemValue = 3.374925e-02 GeV^0 *************************************** 00 CudaFree : 0.145482 sec 0a ProcInit : 0.000564 sec 0b MemAlloc : 0.650950 sec 0c GenCreat : 0.014307 sec 1a GenSeed : 0.000006 sec 1b GenRnGen : 0.000689 sec 2a RamboIni : 0.000024 sec 2b RamboFin : 0.000006 sec 2c CpDTHwgt : 0.008415 sec 2d CpDTHmom : 0.092765 sec 3a SGoodHel : 0.024875 sec 3b SigmaKin : 0.000018 sec 3c CpDTHmes : 0.011461 sec 4a DumpLoop : 0.030057 sec 9a DumpAll : 0.031446 sec 9b GenDestr : 0.000062 sec 9c MemFree : 0.274781 sec 9d CudReset : 0.044059 sec TOTAL : 1.329966 sec TOTAL(n-2) : 1.140425 sec *************************************** real 0m1.341s user 0m0.220s sys 0m1.113s time ./check.exe -p 65536 128 1 *************************************** NumIterations = 1 NumThreadsPerBlock = 128 NumBlocksPerGrid = 65536 --------------------------------------- FP precision = DOUBLE (nan=0) Complex type = STD::COMPLEX RanNumb memory layout = AOSOA[4] Momenta memory layout = AOSOA[4] Curand generation = HOST (C++ code) --------------------------------------- NumberOfEntries = 1 TotalTimeInWaveFuncs = 2.303040e+01 sec MeanTimeInWaveFuncs = 2.303040e+01 sec StdDevTimeInWaveFuncs = 0.000000e+00 sec MinTimeInWaveFuncs = 2.303040e+01 sec MaxTimeInWaveFuncs = 2.303040e+01 sec --------------------------------------- TotalEventsComputed = 8388608 RamboEventsPerSec = 2.971090e+06 sec^-1 MatrixElemEventsPerSec = 3.642407e+05 sec^-1 *************************************** NumMatrixElements(notNan) = 8388608 MeanMatrixElemValue = 1.371734e-02 GeV^0 StdErrMatrixElemValue = 2.831148e-06 GeV^0 StdDevMatrixElemValue = 8.199880e-03 GeV^0 MinMatrixElemValue = 6.071582e-03 GeV^0 MaxMatrixElemValue = 3.374925e-02 GeV^0 *************************************** 0a ProcInit : 0.000398 sec 0b MemAlloc : 1.254668 sec 0c GenCreat : 0.000983 sec 1a GenSeed : 0.000003 sec 1b GenRnGen : 0.455943 sec 2a RamboIni : 0.128670 sec 2b RamboFin : 2.694741 sec 3b SigmaKin : 23.030397 sec 4a DumpLoop : 0.016648 sec 9a DumpAll : 0.032200 sec 9b GenDestr : 0.000091 sec 9c MemFree : 0.143718 sec TOTAL : 27.758461 sec TOTAL(n-2) : 27.614344 sec *************************************** real 0m27.765s user 0m26.679s sys 0m1.067s
- Loading branch information