-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Slice has duration of "Did not end." #311
Comments
Likely fixed by buffer flushing fix in #317 |
Please re-open if #317 does not fix this issue in the upcoming release |
Unfortunately, it seems this bug is still in Omnitrace. When I try to load LLaMa2's perfetto file, all of the kernels do not end. |
Also I can't seem to reopen this issue. Usually the re-open button is on the bottom and I do not see it. |
How big is the perfetto file? Could you be hitting the data limit? Bc it’s strange the samples stop showing up. Samples are not inserted into perfetto until finalization but GPU kernels are so it would be strange (but maybe not impossible) for samples to cause the data limit to be hit and cause perfetto to drop the rest of the records. |
Wait, do the GPU kernels not end or do the samples not end? Bc if it’s just the samples, then that really seems like a data limit issue |
If the size of the perfetto buffer is the issue, you can either increase it (I think the default is maybe 2 GB) or you can disable Perfetto annotations (which will reduce the amount of data sent to Perfetto, sometimes very significantly) |
To answer the questions:
Thank you. |
I just checked and it looks like the default buffer limit is ~1 GB so it sounds you may be hitting it. No, the buffer size has nothing to do with the web UI. There is nothing you can do about any existing perfetto files. You need to recollect data with OMNITRACE_PERFETTO_BUFFER_SIZE_KB set to a larger value and/or set OMNITRACE_PERFETTO_ANNOTATIONS to OFF |
@dwchang79 Internal ticket has been created to further investigate your issue. Thanks! |
Hi @dwchang79, are you still experiencing this issue? If so, do you have a simple way to reproduce it? |
I am no longer at AMD (was on Sabbatical there as a Visiting Scholar), but I believe it is still an issue. |
Thanks for the reply! Do you recall any details about when these issues occurred? Did they only occur for a specific workload? Did you see this consistently? |
When I first reported it, I was running CoralGEMM (don't remember if DGEMM or SGEMM), but later on it was LLaMa-2. And yes, I would see it consistently every run. Thank you. |
Some of my omnitrace proto files "did not end" according to Perfetto. For my work, I am trying to find the end time for certain kernels and when they do not end, it leads to a -1 run time. I heard this is a known issue, but I was told to formally submit this bug (and my other 2) so that they can be properly tracked.
I have attached a screenshot of the behavior. In it you should see the "samples [omnitrace]" slice continue and become white at the end/right and also near the bottom left you can see that the "Duration" says "Did not end."
Thank you.
The text was updated successfully, but these errors were encountered: