You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wrong before that the CUDA thread count per thread block does matter for the performance. Previously we had an unoptimized GPU version so the difference was not easily visible.
Currently the CUDA thread count per thread block is set to
32
. A different value should be set to achieve an optimal performance.Acceptance Criteria
The text was updated successfully, but these errors were encountered: