-
Notifications
You must be signed in to change notification settings - Fork 57
timing / benchmarking kernels from RustaCUDA code #29
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Hey, thanks for the patience - I've been on vacation for a week and a half or so. That's a fair point; it's not obvious how to measure execution time for CUDA kernels. The quick-and-dirty version is to enqueue an event, launch the kernel, and enqueue another event, then sync. Afterwards the events can provide the time when they were processed so you can subtract them to get the time for the kernel. Or you could, if RustaCUDA supported events. It will take some work on both RustaCUDA and Criterion.rs but I think I can improve on this. |
Hi @bheisler , Thanks for the suggestion. I personally don't need this any more, but it may be useful to others. I'm using https://doc.rust-lang.org/std/time/struct.Instant.html to measure time on the Rust side. For my particular case, I care about kernel execution time, not data load time (which happens rarely), so it goes something like: move data async to gpu There's probably all times of problems with it, but for kernels whose run time is measured in seconds, It's been fine fo rme so far. |
Yeah, I'll reopen this as a reminder to myself to document this later. |
There are four approaches to timing that I'm aware of:
My recommendation is definitely nvvp. It's really nice to see a detailed visual chart of your program's performance. That has helped me debug and avoid pitfalls many times already. But all of the above approaches have their merits and use-cases, so it really depends on what you're trying to achieve. Good luck! |
Huh, I didn't know about Yeah, I'd second the recommendation to use nvprof or nvvp for profiling. For simple benchmarking, I usually use events, although they aren't yet available in RustaCUDA. Hopefully they will be soon, time permitting. What I would really want for benchmarking is to use events for measurement combined with Criterion.rs for analysis, but that will take some careful development work and I haven't had time to do that work lately. |
Just for the record: (newbie here: is the support enough to refresh the roadmap in README?) |
This is now kinda old; I would recommend using Criterion.rs' Batcher::iter_custom and implementing whichever timing technique you prefer. |
Can we please get examples for benchmarking / timing kernels via RustaCUDA ?
I'm not familiar with how to benchmark CUDA code and would love to learn from examples.
The text was updated successfully, but these errors were encountered: