Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Choose an optimal thread count for CUDA kernels #632

Closed
sjsprecious opened this issue Aug 23, 2024 · 2 comments · Fixed by #665
Closed

Choose an optimal thread count for CUDA kernels #632

sjsprecious opened this issue Aug 23, 2024 · 2 comments · Fixed by #665
Assignees
Labels
enhancement New feature or request

Comments

@sjsprecious
Copy link
Collaborator

Currently the CUDA thread count per thread block is set to 32. A different value should be set to achieve an optimal performance.

Acceptance Criteria

  • Find out the optimal CUDA thread count for MICM GPU kernels.
@sjsprecious sjsprecious self-assigned this Aug 23, 2024
@sjsprecious sjsprecious added the enhancement New feature or request label Aug 23, 2024
@sjsprecious sjsprecious added this to the CUDA Rosenbrock Solver milestone Aug 23, 2024
@sjsprecious
Copy link
Collaborator Author

Based on the Nsight Systems profiling results, it seems varying the thread count per thread block affects the overall GPU performance marginally.

@sjsprecious
Copy link
Collaborator Author

I was wrong before that the CUDA thread count per thread block does matter for the performance. Previously we had an unoptimized GPU version so the difference was not easily visible.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant