Skip to content

[SYCL][L0][CUDA][HIP] Fix PI_KERNEL_GROUP_INFO_GLOBAL_WORK_SIZE queries #8769

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 8 commits into from
Apr 12, 2023

Conversation

abagusetty
Copy link
Contributor

@abagusetty abagusetty commented Mar 24, 2023

Address kernel query global_work_size for L0, CUDA, HIP from PI_KERNEL_GROUP_INFO_GLOBAL_WORK_SIZE
Fixes #8766
For instance (for X-dimension)
L0: maxGroupSizeX * maxGroupCountX
CUDA: CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X * CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X

@abagusetty abagusetty temporarily deployed to aws March 24, 2023 17:46 — with GitHub Actions Inactive
@jinz2014
Copy link
Contributor

I have a question. Is the max global work size independent of the global work size set in a host program for a kernel ?

@abagusetty
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1694

@bader
Copy link
Contributor

bader commented Mar 24, 2023

@abagusetty, FYI. "verify with" command do not validate on CUDA/HIP platforms.
I recommend adding this test to sycl/test-e2e to validate on CUDA/HIP platforms.
+@aelovikov-intel.

@abagusetty
Copy link
Contributor Author

Thanks, I stumbled upon that too and looked at the wording in Spec, which made me think it could be the max global limits.

The exact semantics of this descriptor are defined by each SYCL backend specification, but the intent is to return the kernel’s maximum global work size.

@abagusetty abagusetty temporarily deployed to aws March 24, 2023 18:30 — with GitHub Actions Inactive
@abagusetty abagusetty temporarily deployed to aws March 24, 2023 19:39 — with GitHub Actions Inactive
@abagusetty abagusetty temporarily deployed to aws March 27, 2023 17:54 — with GitHub Actions Inactive
@abagusetty abagusetty temporarily deployed to aws March 27, 2023 19:00 — with GitHub Actions Inactive
@jinz2014
Copy link
Contributor

The global work sizes from the query will be the same for any kernels. Right ?

@abagusetty
Copy link
Contributor Author

The global work sizes from the query will be the same for any kernels. Right ?

Yes, since the descriptor is a kernel_device_specific one: Any kernel from (custom device type or a built-in kernel) possibly returns the info of device specific global-work-sizes which in turn should be the same for all the kernels IMO.

@abagusetty abagusetty marked this pull request as ready for review March 28, 2023 18:20
@abagusetty abagusetty requested review from a team as code owners March 28, 2023 18:20
@abagusetty abagusetty temporarily deployed to aws March 28, 2023 19:36 — with GitHub Actions Inactive
@abagusetty abagusetty temporarily deployed to aws March 28, 2023 22:14 — with GitHub Actions Inactive
@@ -42,6 +42,11 @@
#include <unordered_map>
#include <vector>

// Helper for one-liner validation
#define PI_ASSERT(condition, error) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit misleading, as it does not assert on the condition, maybe consider renaming it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PI_ASSERT to PI_ERR_CHECK

@abagusetty abagusetty temporarily deployed to aws March 29, 2023 15:00 — with GitHub Actions Inactive
@abagusetty abagusetty temporarily deployed to aws March 29, 2023 17:09 — with GitHub Actions Inactive
@abagusetty abagusetty temporarily deployed to aws April 4, 2023 14:27 — with GitHub Actions Inactive
@abagusetty
Copy link
Contributor Author

Gentle ping @smaslov-intel @jchlanda

Copy link
Contributor

@jandres742 jandres742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on L0 changes.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I think these changes look good. I am a little curious what built-in kernels they would apply to, but I assume CUDA, HIP and L0 guarantee full possible work-sizes either way.

@steffenlarsen steffenlarsen merged commit d666b95 into intel:sycl Apr 12, 2023
@abagusetty
Copy link
Contributor Author

Sorry for the delay. I think these changes look good. I am a little curious what built-in kernels they would apply to, but I assume CUDA, HIP and L0 guarantee full possible work-sizes either way.

Thanks for the feed back on the built-ins, I too stumbled upon that a bit: Just convinced myself that they see the complete device limits.

@abagusetty abagusetty deleted the fix8766 branch April 12, 2023 23:32
jandres742 pushed a commit to jandres742/llvm that referenced this pull request Apr 21, 2023
intel#8769

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
jandres742 pushed a commit to jandres742/llvm that referenced this pull request Apr 24, 2023
intel#8769

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
jandres742 pushed a commit to jandres742/llvm that referenced this pull request May 3, 2023
intel#8769

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
jandres742 pushed a commit to jandres742/llvm that referenced this pull request May 16, 2023
intel#8769

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
jandres742 pushed a commit to jandres742/llvm that referenced this pull request May 23, 2023
intel#8769

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
jandres742 pushed a commit to jandres742/llvm that referenced this pull request May 26, 2023
intel#8769

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA and HIP backends do not support PI_KERNEL_GROUP_INFO_GLOBAL_WORK_SIZE
7 participants