Skip to content

[SYCL][CUDA] Port CUDA plugin to Unified Runtime #9512

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 45 commits into from
Jun 14, 2023

Conversation

callumfare
Copy link
Contributor

This moves the CUDA plugin implementation to Unified Runtime; and changes the pi_cuda plugin to use pi2ur to implement PI. The changes to the implementation have been kept to a minimum and should be functionally the same. Documentation and comments have been moved verbatim, other than changing PI references to UR.

This PR is based on top of the Level Zero adapter (#8744) so will only be ready when that is merged.

@callumfare callumfare requested review from a team as code owners May 18, 2023 09:50
@callumfare callumfare marked this pull request as draft May 18, 2023 09:51
@callumfare callumfare temporarily deployed to aws May 18, 2023 09:53 — with GitHub Actions Inactive
@callumfare callumfare temporarily deployed to aws May 31, 2023 12:25 — with GitHub Actions Inactive
@callumfare callumfare temporarily deployed to aws May 31, 2023 13:03 — with GitHub Actions Inactive
@callumfare callumfare temporarily deployed to aws May 31, 2023 13:41 — with GitHub Actions Inactive
@callumfare callumfare temporarily deployed to aws May 31, 2023 14:24 — with GitHub Actions Inactive
@callumfare callumfare force-pushed the cuda_ur_port branch 2 times, most recently from cbfad32 to f1bba52 Compare June 5, 2023 08:31
@callumfare callumfare temporarily deployed to aws June 5, 2023 09:13 — with GitHub Actions Inactive
@callumfare callumfare temporarily deployed to aws June 5, 2023 09:56 — with GitHub Actions Inactive

#include <sstream>

ur_result_t map_error_ur(CUresult result) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think going forward we agreed that this mapping from cuResult to urResult should be removed as explained here: oneapi-src/unified-runtime#500 (comment)

I understand maybe you don't want to do it at this point. However it actually may be easier to do it at this point and does allow the removal of a lot of redundant code so FYI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to keep the scope of this PR and the porting to effort to just a straight port of the existing code to avoid any changes to the behavior of the plugin/adapter, so I don't think it makes sense to do it at this stage. Plus the size of the PR means that reviewing any actual functional changes at the same time would be tricky.

It also requires a resolution to oneapi-src/unified-runtime#500

}
}

ur_result_t check_error_ur(CUresult result, const char *function, int line,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


int getAttribute(ur_device_handle_t device, CUdevice_attribute attribute) {
int value;
sycl::detail::ur::assertion(
Copy link
Contributor

@JackAKirk JackAKirk Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also cases like these, where assertion is used instead of check_error. This will also lead to lost native error information (I think, although I haven't easily found the definition of ur::assertion). All such cases (all calls to cu* functions) should be setting the last message and reporting a plugin specific error as described here: oneapi-src/unified-runtime#500 (comment) when the result in not CUDA_SUCCESS

@JackAKirk
Copy link
Contributor

I've added some comments that are basically criticisms of PI and changes we already agreed I think regarding error handling. Could make sense to use this as a good opportunity to make these error handling changes.

@callumfare callumfare temporarily deployed to aws June 6, 2023 08:58 — with GitHub Actions Inactive
@callumfare callumfare temporarily deployed to aws June 6, 2023 09:40 — with GitHub Actions Inactive
Copy link
Contributor

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think this is an excellent port, and within the constraints of working to an existing API spec it's very worthy. I'm no expert on PI and runtimes in general so most of my comments are on a function / documentation level rather than an architectural level

However, I'd really like to see some of the decisions around code style aligned more closely with upstream. I understand some of this is in progress (e.g. naming conventions as discussed with @jchlanda), but I'd like to reiterate how important ergonomics of programming are. A couple of comments I've made are of high priority to me (hidden control flow in macros that buy you nothing) because making the code readable and clear at first glance is critical to understanding

case PI_EXT_CODEPLAY_DEVICE_INFO_MAX_REGISTERS_PER_WORK_GROUP: {
InfoType = UR_EXT_DEVICE_INFO_MAX_REGISTERS_PER_WORK_GROUP;
break;
}
default:
return PI_ERROR_UNKNOWN;
};

PI_ASSERT(Device, PI_ERROR_INVALID_DEVICE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm a bit late to the party as this is already in the codebase, but it really troubles me that we have macros that hide flow control. It's one extra line to expand the macro, and makes the control flow much more obvious to anyone reading. Additionally, it's not an assertion of any kind (which is about ensuring invariants about the design of the system are true); it's a simple parameter check for user input.

Same goes for HANDLE_ERRORS.

There's zero ergonomic benefit as whenever you wrap an expression in HANDLE_ERRORS you make the line length longer, clang-format splits it across lines, and it absolutely confounds stepping in a debugger.

Please use this as an opportunity to not reinforce this broken idiom

Copy link
Contributor

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think this is an excellent port, and within the constraints of working to an existing API spec it's very worthy. I'm no expert on PI and runtimes in general so most of my comments are on a function / documentation level rather than an architectural level

However, I'd really like to see some of the decisions around code style aligned more closely with upstream. I understand some of this is in progress (e.g. naming conventions as discussed with @jchlanda), but I'd like to reiterate how important ergonomics of programming are. A couple of comments I've made are of high priority to me (hidden control flow in macros that buy you nothing) because making the code readable and clear at first glance is critical to understanding

@callumfare callumfare temporarily deployed to aws June 14, 2023 09:41 — with GitHub Actions Inactive
@callumfare callumfare temporarily deployed to aws June 14, 2023 10:21 — with GitHub Actions Inactive
@callumfare
Copy link
Contributor Author

@intel/llvm-gatekeepers Please merge this when possible

@dm-vodopyanov dm-vodopyanov merged commit ec59d44 into intel:sycl Jun 14, 2023
@kbenzie
Copy link
Contributor

kbenzie commented Jun 14, 2023

We are working on a fix to this issue in the post merge actions.

kbenzie added a commit to kbenzie/intel-llvm that referenced this pull request Jun 14, 2023
Resolves the warnings as errors reported in [post
merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360)
as a result of merging intel#9512. Additionally move pre-processor guards to
resolve unused global variables which would also fail in this build
configuration (clang & SYCL_ENABLE_WERROR=ON).
@kbenzie
Copy link
Contributor

kbenzie commented Jun 14, 2023

We are working on a fix to this issue in the post merge actions.

Fixed in #9872

steffenlarsen pushed a commit that referenced this pull request Jun 14, 2023
Resolves the warnings as errors reported in [post
merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360)
as a result of merging #9512. Additionally move pre-processor guards to
resolve unused global variables which would also fail in this build
configuration (clang & SYCL_ENABLE_WERROR=ON).
fineg74 pushed a commit to fineg74/llvm that referenced this pull request Jun 15, 2023
This moves the CUDA plugin implementation to Unified Runtime; and
changes the pi_cuda plugin to use pi2ur to implement PI. The changes to
the implementation have been kept to a minimum and should be
functionally the same. Documentation and comments have been moved
verbatim, other than changing PI references to UR.

This PR is based on top of the Level Zero adapter (intel#8744) so will only
be ready when that is merged.

---------

Co-authored-by: Petr Vesely <petr.vesely@codeplay.com>
Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
Co-authored-by: Martin Morrison-Grant <martin.morrisongrant@codeplay.com>
Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
fineg74 pushed a commit to fineg74/llvm that referenced this pull request Jun 15, 2023
Resolves the warnings as errors reported in [post
merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360)
as a result of merging intel#9512. Additionally move pre-processor guards to
resolve unused global variables which would also fail in this build
configuration (clang & SYCL_ENABLE_WERROR=ON).
steffenlarsen pushed a commit that referenced this pull request Jul 13, 2023
This moves the HIP plugin implementation to Unified Runtime; and changes
the pi_hip plugin to use pi2ur to implement PI. The changes to the
implementation have been kept to a minimum and should be functionally
the same. Documentation and comments have been moved verbatim, other
than changing PI references to UR.

This PR is based on top of the CUDA adapter
(#9512) so will only be ready when
that is merged.

---------

Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
Co-authored-by: Petr Vesely <veselypeta@gmail.com>
Co-authored-by: Callum Fare <callum@codeplay.com>
Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
fabiomestre pushed a commit to fabiomestre/llvm that referenced this pull request Sep 26, 2023
This moves the CUDA plugin implementation to Unified Runtime; and
changes the pi_cuda plugin to use pi2ur to implement PI. The changes to
the implementation have been kept to a minimum and should be
functionally the same. Documentation and comments have been moved
verbatim, other than changing PI references to UR.

This PR is based on top of the Level Zero adapter (intel#8744) so will only
be ready when that is merged.

---------

Co-authored-by: Petr Vesely <petr.vesely@codeplay.com>
Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
Co-authored-by: Martin Morrison-Grant <martin.morrisongrant@codeplay.com>
Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
fabiomestre pushed a commit to fabiomestre/llvm that referenced this pull request Sep 26, 2023
Resolves the warnings as errors reported in [post
merge](https://github.com/intel/llvm/actions/runs/5266121277/jobs/9519634360)
as a result of merging intel#9512. Additionally move pre-processor guards to
resolve unused global variables which would also fail in this build
configuration (clang & SYCL_ENABLE_WERROR=ON).
veselypeta added a commit to veselypeta/llvm that referenced this pull request Sep 28, 2023
This moves the HIP plugin implementation to Unified Runtime; and changes
the pi_hip plugin to use pi2ur to implement PI. The changes to the
implementation have been kept to a minimum and should be functionally
the same. Documentation and comments have been moved verbatim, other
than changing PI references to UR.

This PR is based on top of the CUDA adapter
(intel#9512) so will only be ready when
that is merged.

---------

Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
Co-authored-by: Petr Vesely <veselypeta@gmail.com>
Co-authored-by: Callum Fare <callum@codeplay.com>
Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
szadam pushed a commit to szadam/unified-runtime that referenced this pull request Oct 13, 2023
This moves the HIP plugin implementation to Unified Runtime; and changes
the pi_hip plugin to use pi2ur to implement PI. The changes to the
implementation have been kept to a minimum and should be functionally
the same. Documentation and comments have been moved verbatim, other
than changing PI references to UR.

This PR is based on top of the CUDA adapter
(intel/llvm#9512) so will only be ready when
that is merged.

---------

Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
Co-authored-by: Petr Vesely <veselypeta@gmail.com>
Co-authored-by: Callum Fare <callum@codeplay.com>
Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
omarahmed1111 added a commit to omarahmed1111/unified-runtime that referenced this pull request Oct 23, 2023
This moves the HIP plugin implementation to Unified Runtime; and changes
the pi_hip plugin to use pi2ur to implement PI. The changes to the
implementation have been kept to a minimum and should be functionally
the same. Documentation and comments have been moved verbatim, other
than changing PI references to UR.

This PR is based on top of the CUDA adapter
(intel/llvm#9512) so will only be ready when
that is merged.

---------

Co-authored-by: Omar Ahmed <omar.ahmed@codeplay.com>
Co-authored-by: Petr Vesely <veselypeta@gmail.com>
Co-authored-by: Callum Fare <callum@codeplay.com>
Co-authored-by: Aaron Greig <aaron.greig@codeplay.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.