Skip to content

[SYCL] Enable host optimization of work-item free functions #2967

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged

Conversation

cperkinsintel
Copy link
Contributor

@cperkinsintel cperkinsintel commented Dec 29, 2020

The SYCL free functions ( this_item, this_id, etc) are expensive to support on host devices. They cause performance delays because every iteration through one of the parallel_for routines the various indexing values have to be updated in case the users code might call this_item or this_id (or the others). But with the new callsThisItem method added to the Kernel Information (thanks @rdeodhar !), the host device can avoid paying the performance penalty if the users code doesn't actually call this_item.. We can detect at compile time whether or not any of the this_xxx free functions are used by the users code, and if not, don't bother storing the indexing data in each loop iteration.

In this PR we add further expand the Kernel Information to support a callsAnyThisFreeFunction method, and we use it to avoid the sundry store_item etc. calls on the host.

Signed-off-by: Chris Perkins <chris.perkins@intel.com>
…ons and checking for any usage to optimize host kernel tasks.

Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Signed-off-by: Chris Perkins <chris.perkins@intel.com>
@cperkinsintel cperkinsintel marked this pull request as ready for review December 30, 2020 00:30
Comment on lines +3939 to +3943
O << (K.FreeFunctionCalls.CallsThisId ||
K.FreeFunctionCalls.CallsThisItem ||
K.FreeFunctionCalls.CallsThisNDItem ||
K.FreeFunctionCalls.CallsThisGroup)
<< "; }\n";
Copy link
Contributor

@Fznamznon Fznamznon Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels that if the final information in the integration header doesn't say which exactly free function is called, we don't need four different flags and four identical handlers (I mean SYCLIntegrationHeader::setCallsThisItem and others here) in front-end either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand I guess the host runtime could be optimized if we knows exactly which one is used...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@intel/llvm-reviewers-runtime , WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing code unrelated to this PR needs to know if this_item is called and that isn't concerned about the other free functions. The optimization in this PR just needs to know if any of them are called.
My thoughts are that it's best to make a proper record of each free function now. Recording only "this_item" and "anything" seems sloppy.
On the API side, though, the functions match our current needs. The pre-existing callsThisItem() and callsAnyThisFreeFunction() are the two affordances, but can be easily expanded in the future if required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect it to be required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of anything pending. But I'd wager at even odds on needing to know about this_nd_item or this_group usage in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright thanks! I'm ok with the FE changes.

Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also update the PR description which is completely cryptic.
Perhaps with "Enable host optimization of work-item free functions" for the title and a few paragraphs of what is happening here and why.

Comment on lines +3939 to +3943
O << (K.FreeFunctionCalls.CallsThisId ||
K.FreeFunctionCalls.CallsThisItem ||
K.FreeFunctionCalls.CallsThisNDItem ||
K.FreeFunctionCalls.CallsThisGroup)
<< "; }\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand I guess the host runtime could be optimized if we knows exactly which one is used...

@cperkinsintel cperkinsintel changed the title [SYCL] free function host optimization [SYCL] Enable host optimization of work-item free functions Dec 30, 2020
Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM, other than the unresolved comment. I am alright with keeping the 4 different flags for now, if we expect to differentiate between free functions in the future.

Comment on lines +3939 to +3943
O << (K.FreeFunctionCalls.CallsThisId ||
K.FreeFunctionCalls.CallsThisItem ||
K.FreeFunctionCalls.CallsThisNDItem ||
K.FreeFunctionCalls.CallsThisGroup)
<< "; }\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect it to be required?

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM.

Comment on lines +3939 to +3943
O << (K.FreeFunctionCalls.CallsThisId ||
K.FreeFunctionCalls.CallsThisItem ||
K.FreeFunctionCalls.CallsThisNDItem ||
K.FreeFunctionCalls.CallsThisGroup)
<< "; }\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright thanks! I'm ok with the FE changes.

Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Signed-off-by: Chris Perkins <chris.perkins@intel.com>
Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE changes LGTM

Copy link
Contributor

@v-klochkov v-klochkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good To Me.
I also did not find any changes that would break backward compatibility.

Copy link
Contributor

@rbegam rbegam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sycl changes LGTM.
there is a pre-commit fail though.

@pvchupin pvchupin merged commit b83a1a8 into intel:sycl Jan 8, 2021
@pvchupin
Copy link
Contributor

pvchupin commented Jan 8, 2021

fail is flaky, @v-klochkov faced the same elsewhere and going to open issue.

jsji pushed a commit that referenced this pull request Feb 25, 2025
With opaque pointers, this is equivalent to calling
`IRBuilder::CreateAddrSpaceCast`.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@0d3a1a207dc6dc5
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants