Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[src] use CL_PROFILING_COMMAND_END as latency time #67

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

alohali
Copy link

@alohali alohali commented Apr 16, 2020

CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED is real kernel latency

@alohali
Copy link
Author

alohali commented Apr 16, 2020

Is it more accurate to test kernel latency with CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED and run a extreme small kernel?
see >20us difference on several ARM MALI GPU device.

@krrishnarraj
Copy link
Owner

Thanks.
I agree with the small kernel part.
I am seeing more latency for cpu platforms like pocl. How can 'CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED' give better accuracy wrt CL_PROFILING_COMMAND_START?

@alohali
Copy link
Author

alohali commented Apr 23, 2020

Thanks.
I agree with the small kernel part.
I am seeing more latency for cpu platforms like pocl. How can 'CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED' give better accuracy wrt CL_PROFILING_COMMAND_START?

Because kernel launch latency contains pre-launch, post-launch latency and other execution latency. CL_PROFILING_COMMAND_START - CL_PROFILING_COMMAND_QUEUED only calculates pre launch parts but not post launch parts. CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_QUEUED includes both pre and post. The real kernel execution time is almost zero.

@nchristensen
Copy link
Contributor

nchristensen commented Oct 7, 2022

From https://stackoverflow.com/questions/39924433/opencl-events-ambiguity it seems to me that CL_PROFILING_COMMAND_SUBMIT - CL_PROFILING_COMMAND_START is the pre-execution latency. CL_PROFILING_COMMAND_COMPLETE was added in OpenCL 2.0. I'm guessing CL_PROFILING_COMMAND_COMPLETE - CL_PROFILING_COMMAND_END is the post-execution latency.

There may also a lower bound on CL_PROFILING_COMMAND_END - CL_PROFILING_COMMAND_START which might be another form of latency.

So CL_PROFILING_COMMAND_COMPLETE - CL_PROFILING_COMMAND_SUBMIT on very small kernel may be a way to measure the latency.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants