[SYCL][CUDA] Port CUDA plugin to Unified Runtime #9512

callumfare · 2023-05-18T09:50:47Z

This moves the CUDA plugin implementation to Unified Runtime; and changes the pi_cuda plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR.

This PR is based on top of the Level Zero adapter (#8744) so will only be ready when that is merged.

JackAKirk · 2023-06-05T13:08:00Z

sycl/plugins/unified_runtime/ur/adapters/cuda/common.cpp

+
+#include <sstream>
+
+ur_result_t map_error_ur(CUresult result) {


I think going forward we agreed that this mapping from cuResult to urResult should be removed as explained here: oneapi-src/unified-runtime#500 (comment)

I understand maybe you don't want to do it at this point. However it actually may be easier to do it at this point and does allow the removal of a lot of redundant code so FYI.

We're trying to keep the scope of this PR and the porting to effort to just a straight port of the existing code to avoid any changes to the behavior of the plugin/adapter, so I don't think it makes sense to do it at this stage. Plus the size of the PR means that reviewing any actual functional changes at the same time would be tricky.

It also requires a resolution to oneapi-src/unified-runtime#500

JackAKirk · 2023-06-05T13:08:37Z

sycl/plugins/unified_runtime/ur/adapters/cuda/common.cpp

+  }
+}
+
+ur_result_t check_error_ur(CUresult result, const char *function, int line,


Same as https://github.com/intel/llvm/pull/9512/files#r1218050610

JackAKirk · 2023-06-05T13:43:32Z

sycl/plugins/unified_runtime/ur/adapters/cuda/device.cpp

+
+int getAttribute(ur_device_handle_t device, CUdevice_attribute attribute) {
+  int value;
+  sycl::detail::ur::assertion(


There are also cases like these, where assertion is used instead of check_error. This will also lead to lost native error information (I think, although I haven't easily found the definition of ur::assertion). All such cases (all calls to cu* functions) should be setting the last message and reporting a plugin specific error as described here: oneapi-src/unified-runtime#500 (comment) when the result in not CUDA_SUCCESS

JackAKirk · 2023-06-05T14:19:11Z

I've added some comments that are basically criticisms of PI and changes we already agreed I think regarding error handling. Could make sense to use this as a good opportunity to make these error handling changes.

sycl/plugins/unified_runtime/ur/adapters/cuda/event.cpp

sycl/plugins/unified_runtime/ur/adapters/cuda/common.cpp

sycl/plugins/unified_runtime/ur/adapters/cuda/context.cpp

sycl/plugins/unified_runtime/ur/adapters/cuda/device.cpp

ldrumm

In general I think this is an excellent port, and within the constraints of working to an existing API spec it's very worthy. I'm no expert on PI and runtimes in general so most of my comments are on a function / documentation level rather than an architectural level

However, I'd really like to see some of the decisions around code style aligned more closely with upstream. I understand some of this is in progress (e.g. naming conventions as discussed with @jchlanda), but I'd like to reiterate how important ergonomics of programming are. A couple of comments I've made are of high priority to me (hidden control flow in macros that buy you nothing) because making the code readable and clear at first glance is critical to understanding

sycl/plugins/unified_runtime/ur/adapters/cuda/enqueue.cpp

sycl/plugins/unified_runtime/ur/adapters/cuda/ur_interface_loader.cpp

ldrumm · 2023-06-07T09:34:37Z

sycl/plugins/unified_runtime/pi2ur.hpp

+  case PI_EXT_CODEPLAY_DEVICE_INFO_MAX_REGISTERS_PER_WORK_GROUP: {
+    InfoType = UR_EXT_DEVICE_INFO_MAX_REGISTERS_PER_WORK_GROUP;
+    break;
+  }
  default:
    return PI_ERROR_UNKNOWN;
  };

  PI_ASSERT(Device, PI_ERROR_INVALID_DEVICE);


I know I'm a bit late to the party as this is already in the codebase, but it really troubles me that we have macros that hide flow control. It's one extra line to expand the macro, and makes the control flow much more obvious to anyone reading. Additionally, it's not an assertion of any kind (which is about ensuring invariants about the design of the system are true); it's a simple parameter check for user input.

Same goes for HANDLE_ERRORS.

There's zero ergonomic benefit as whenever you wrap an expression in HANDLE_ERRORS you make the line length longer, clang-format splits it across lines, and it absolutely confounds stepping in a debugger.

Please use this as an opportunity to not reinforce this broken idiom

sycl/plugins/unified_runtime/ur/adapters/cuda/program.cpp

ldrumm

In general I think this is an excellent port, and within the constraints of working to an existing API spec it's very worthy. I'm no expert on PI and runtimes in general so most of my comments are on a function / documentation level rather than an architectural level

However, I'd really like to see some of the decisions around code style aligned more closely with upstream. I understand some of this is in progress (e.g. naming conventions as discussed with @jchlanda), but I'd like to reiterate how important ergonomics of programming are. A couple of comments I've made are of high priority to me (hidden control flow in macros that buy you nothing) because making the code readable and clear at first glance is critical to understanding

…nding further investigation

…nch.

…ance.

callumfare · 2023-06-14T10:28:26Z

@intel/llvm-gatekeepers Please merge this when possible

kbenzie · 2023-06-14T11:54:06Z

We are working on a fix to this issue in the post merge actions.

Resolves the warnings as errors reported in [post merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360) as a result of merging intel#9512. Additionally move pre-processor guards to resolve unused global variables which would also fail in this build configuration (clang & SYCL_ENABLE_WERROR=ON).

kbenzie · 2023-06-14T12:28:55Z

We are working on a fix to this issue in the post merge actions.

Fixed in #9872

Resolves the warnings as errors reported in [post merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360) as a result of merging #9512. Additionally move pre-processor guards to resolve unused global variables which would also fail in this build configuration (clang & SYCL_ENABLE_WERROR=ON).

This moves the CUDA plugin implementation to Unified Runtime; and changes the pi_cuda plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR. This PR is based on top of the Level Zero adapter (intel#8744) so will only be ready when that is merged. --------- Co-authored-by: Petr Vesely <petr.vesely@codeplay.com> Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com> Co-authored-by: Martin Morrison-Grant <martin.morrisongrant@codeplay.com> Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>

Resolves the warnings as errors reported in [post merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360) as a result of merging intel#9512. Additionally move pre-processor guards to resolve unused global variables which would also fail in this build configuration (clang & SYCL_ENABLE_WERROR=ON).

This moves the HIP plugin implementation to Unified Runtime; and changes the pi_hip plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR. This PR is based on top of the CUDA adapter (#9512) so will only be ready when that is merged. --------- Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com> Co-authored-by: Petr Vesely <veselypeta@gmail.com> Co-authored-by: Callum Fare <callum@codeplay.com> Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>

This moves the CUDA plugin implementation to Unified Runtime; and changes the pi_cuda plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR. This PR is based on top of the Level Zero adapter (intel#8744) so will only be ready when that is merged. --------- Co-authored-by: Petr Vesely <petr.vesely@codeplay.com> Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com> Co-authored-by: Martin Morrison-Grant <martin.morrisongrant@codeplay.com> Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>

Resolves the warnings as errors reported in [post merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360) as a result of merging intel#9512. Additionally move pre-processor guards to resolve unused global variables which would also fail in this build configuration (clang & SYCL_ENABLE_WERROR=ON).

This moves the HIP plugin implementation to Unified Runtime; and changes the pi_hip plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR. This PR is based on top of the CUDA adapter (intel#9512) so will only be ready when that is merged. --------- Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com> Co-authored-by: Petr Vesely <veselypeta@gmail.com> Co-authored-by: Callum Fare <callum@codeplay.com> Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>

This moves the HIP plugin implementation to Unified Runtime; and changes the pi_hip plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR. This PR is based on top of the CUDA adapter (intel/llvm#9512) so will only be ready when that is merged. --------- Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com> Co-authored-by: Petr Vesely <veselypeta@gmail.com> Co-authored-by: Callum Fare <callum@codeplay.com> Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>

callumfare requested review from a team as code owners May 18, 2023 09:50

callumfare requested review from npmiller and cperkinsintel May 18, 2023 09:50

callumfare marked this pull request as draft May 18, 2023 09:51

callumfare temporarily deployed to aws May 18, 2023 09:53 — with GitHub Actions Inactive

cperkinsintel requested a review from smaslov-intel May 18, 2023 16:27

omarahmed1111 mentioned this pull request May 26, 2023

[SYCL][HIP] Port HIP plugin to Unified Runtime #9617

Merged

callumfare force-pushed the cuda_ur_port branch from 1644bdf to a0733d6 Compare May 31, 2023 10:42

callumfare temporarily deployed to aws May 31, 2023 12:25 — with GitHub Actions Inactive

callumfare temporarily deployed to aws May 31, 2023 13:03 — with GitHub Actions Inactive

callumfare force-pushed the cuda_ur_port branch from ca63507 to 3c2df7c Compare May 31, 2023 13:14

callumfare temporarily deployed to aws May 31, 2023 13:41 — with GitHub Actions Inactive

callumfare temporarily deployed to aws May 31, 2023 14:24 — with GitHub Actions Inactive

callumfare force-pushed the cuda_ur_port branch 2 times, most recently from cbfad32 to f1bba52 Compare June 5, 2023 08:31

callumfare temporarily deployed to aws June 5, 2023 09:13 — with GitHub Actions Inactive

callumfare temporarily deployed to aws June 5, 2023 09:56 — with GitHub Actions Inactive

JackAKirk reviewed Jun 5, 2023

View reviewed changes

GeorgeWeb reviewed Jun 5, 2023

View reviewed changes

sycl/plugins/unified_runtime/ur/adapters/cuda/event.cpp Outdated Show resolved Hide resolved

callumfare force-pushed the cuda_ur_port branch from f1bba52 to 850709d Compare June 6, 2023 08:21

callumfare temporarily deployed to aws June 6, 2023 08:58 — with GitHub Actions Inactive

callumfare temporarily deployed to aws June 6, 2023 09:40 — with GitHub Actions Inactive

jchlanda reviewed Jun 6, 2023

View reviewed changes

sycl/plugins/unified_runtime/ur/adapters/cuda/common.cpp Show resolved Hide resolved

sycl/plugins/unified_runtime/ur/adapters/cuda/context.cpp Outdated Show resolved Hide resolved

sycl/plugins/unified_runtime/ur/adapters/cuda/device.cpp Outdated Show resolved Hide resolved

ldrumm requested changes Jun 7, 2023

View reviewed changes

callumfare and others added 10 commits June 14, 2023 10:07

Mark KernelFusion/sync_two_queues_event_dep as unsupported on cuda pe…

0011b91

…nding further investigation

[SYCL][CUDA] Fix assumption about work dimensions in EnqueueKernelLau…

9e97af7

…nch.

[SYCL][CUDA] Correct return type of cuda USM capability queries.

b538dd8

[SYCL][CUDA] A number of small cuda adapter fixes for cts/spec compli…

9811f9b

…ance.

[SYCL][UR] Avoid zero-length new in pi2ur.

fce479c

[SYCL][CUDA] Mass fixup of code style in the CUDA adapter

9b3448a

[SYCL][CUDA][PI][UR] Fix PR review comments

a0de2d7

[SYCL][CUDA] Tidy CMakeLists.txt

2a50972

Fix various build warnings

c39e794

Address more review feedback

b64fcbd

callumfare force-pushed the cuda_ur_port branch from 9ca8484 to b64fcbd Compare June 14, 2023 09:26

callumfare temporarily deployed to aws June 14, 2023 09:41 — with GitHub Actions Inactive

callumfare temporarily deployed to aws June 14, 2023 10:21 — with GitHub Actions Inactive

dm-vodopyanov merged commit ec59d44 into intel:sycl Jun 14, 2023

kbenzie mentioned this pull request Jun 14, 2023

[SYCL][CUDA] Fix post merge errors from #9512 #9872

Merged

kbenzie mentioned this pull request Jun 14, 2023

Add targets to run CTS tests with validation layer. oneapi-src/unified-runtime#613

Closed

dm-vodopyanov mentioned this pull request Jun 16, 2023

[SYCL][UR] Update Unified Runtime tag to support UR_DEVICE_INFO_IP_VERSION #9873

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA] Port CUDA plugin to Unified Runtime #9512

[SYCL][CUDA] Port CUDA plugin to Unified Runtime #9512

callumfare commented May 18, 2023

JackAKirk Jun 5, 2023

callumfare Jun 6, 2023

JackAKirk Jun 5, 2023

JackAKirk Jun 5, 2023 •

edited

Loading

JackAKirk commented Jun 5, 2023

ldrumm left a comment •

edited

Loading

ldrumm Jun 7, 2023

ldrumm left a comment •

edited

Loading

callumfare commented Jun 14, 2023

kbenzie commented Jun 14, 2023

kbenzie commented Jun 14, 2023


		#include <sstream>

		ur_result_t map_error_ur(CUresult result) {

[SYCL][CUDA] Port CUDA plugin to Unified Runtime #9512

[SYCL][CUDA] Port CUDA plugin to Unified Runtime #9512

Conversation

callumfare commented May 18, 2023

JackAKirk Jun 5, 2023

Choose a reason for hiding this comment

callumfare Jun 6, 2023

Choose a reason for hiding this comment

JackAKirk Jun 5, 2023

Choose a reason for hiding this comment

JackAKirk Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

JackAKirk commented Jun 5, 2023

ldrumm left a comment • edited Loading

Choose a reason for hiding this comment

ldrumm Jun 7, 2023

Choose a reason for hiding this comment

ldrumm left a comment • edited Loading

Choose a reason for hiding this comment

callumfare commented Jun 14, 2023

kbenzie commented Jun 14, 2023

kbenzie commented Jun 14, 2023

JackAKirk Jun 5, 2023 •

edited

Loading

ldrumm left a comment •

edited

Loading

ldrumm left a comment •

edited

Loading