-
Notifications
You must be signed in to change notification settings - Fork 770
[SYCL][CUDA] atomic_ref.fetch_add used for fp64 reduction if device.has(atomic64) #3950
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
…ion when device has atomic64. Only the cuda backend is currently supported for the atomic64 device aspect. SYCL2020 introduces the atomic64 aspect which is required for the use of atomic_ref.fetch_add with fp64 operand. These changes allow devices with the atomic64 aspect to use a specialized reduction when using the add operator that makes use of atomics at the group level. If the atomic64 aspect is not available then the default existing implementation which does not use atomic operations is used. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve to let testing start
@intel/llvm-reviewers-cuda, @smaslov-intel, ping. |
sycl/include/CL/sycl/aspects.hpp
Outdated
@@ -38,7 +38,8 @@ enum class aspect { | |||
ext_intel_gpu_subslices_per_slice = 22, | |||
ext_intel_gpu_eu_count_per_subslice = 23, | |||
ext_intel_max_mem_bandwidth = 24, | |||
ext_intel_mem_channel = 25 | |||
ext_intel_mem_channel = 25, | |||
atomic64 = 26 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, there are int64_base_atomics
and int64_extended_atomics
that overlap with atomic64
. Shouldn't we deprecate int64_*_atomics
aspects?
…plugin. This change ensures that all Reduction tests run for the cuda backend. The cl_khr_fp64 extension is used for all cuda devices by default, since sm_XX wherer XX < 13 has been unsupported by the cuda driver since cuda 8.0. The test fp16-with-unnamed-lambda.cpp has been deleted because it has a duplicate in the test suite (in the dir SYCL/Regression). In both cases the triple is missing on the first line which needs to be added to the llvm-test-suite copy to avoid a test failure now that the test is not skipped for the cuda backend. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
The last commit also means that the llvm-test-suite/SYCL/DeviceLib/built-ins/nan.cpp test which was previously skipped now runs for cuda. This test fails the fp64 case. I will open a PR in the test suite which marks the cuda case as XFAIL. I will also correct the missing triple in the first line of the fp16-with-unnamed-lambda.cpp test. |
I suggest moving fp16 related changes to a separate pull request. Current PR title and description mention 64-bit data types only. |
Yes you are right, thanks. I'll revert the fp16 change and open a separate PR. |
@intel/llvm-reviewers-cuda, @smaslov-intel, @intel/llvm-reviewers-runtime, @v-klochkov, @Pennycook, can you take a look, please? |
…he cuda plugin." This reverts commit 80c8508.
This change allows a skipped fp64 Reduction test in llvm-test-suite (reduction_nd_ext_double.cpp) to run for the cuda backend. The cl_khr_fp64 extension is used for all cuda devices by default, since sm_XX wherer XX < 13 has been unsupported by the cuda driver since cuda 8.0. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a definition of "atomic64" extension in SYCL2020 approved way
Renamed has_atomic_add_float to has_atomic_add_float64, since the general usage that includes float32 is only expected to be temporary. has_atomic_add_float64 is a pseudonym of IsReduOptForAtomic64Add. Updated documentation describing the current temporary usage of fp32 within IsReduOptForAtomic64Add. IsReduOptForFastFloatAtomicAdd has been renamed IsReduOptForAtomic64Add to distinguish that this boolean should only be used in the case that the device has the sycl2020 atomic64 aspect, consistent with the naming convention used in other functions that are specializations for the atomic64 aspect. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk, there conflicts in sycl/include/CL/sycl/aspects.hpp. Could you merge tip of the |
Sure. Thanks for the reminder. |
Plugin.call_nocheck casts result to false if PI_SUCCESS was not returned. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM.
Remove comments.
Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx. The triple has been added. nan.cpp fails for the fp64 case that was switched on by (intel/llvm#3950). Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk, this patch regressed |
Yes, there is a test case in nan.cpp that requires the fp64 device aspect and was previously skipped for the cuda backend since fp64 was previously set always to false. Now the cuda device has the fp64 aspect by default following this PR being merged, so the test is run. This PR should not have affected anything else in the nan.cpp test. I have set the nan.cpp test to XFAIL for cuda in the following llvm-test-suite PR: intel/llvm-test-suite#336 |
Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx. The triple has been added. nan.cpp fails for the fp64 case that was switched on by (intel/llvm#3950). Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
* upstream/sycl: (649 commits) [SYCL][Driver][NFC] Update integration footer test for 32-bit host (intel#4039) [SYCL][L0] Initialize descriptor .stype and .pNext (intel#4032) [SYCL] Add sycl::kernel::get_kernel_bundle method (intel#3855) [SYCL] Add support for device UUID as a SYCL extension. (intel#3696) [SYCL][Matrix] Add spec document for the matrix extension interface and its first implementation for AMX (intel#3551) Fix debug build mangler test after PR#3992 (8f38045). (intel#4033) [Driver][SYCL] Restrict user -include file in final integration footer step (intel#4036) [SYCL] [Tests] Do not copy device binary image mocks (intel#4023) [SYCL][Doc] Update docs to reflect new compiler features (intel#4030) [SYCL][CUDA] cl_khr_fp16 extension connected to cuda PI. (intel#4029) [SYCL][NFC] Refactor RT unit tests (intel#4021) [SYCL] Switch to using integration footer by default (intel#3777) [SYCL][CUDA] Add the Use Default Stream property (intel#4004) Uplift GPU RT version for Linux to 21.24.20098 (intel#4003) [SYCL][CUDA] atomic_ref.fetch_add used for fp64 reduction if device.has(atomic64) (intel#3950) [Driver][SYCL] Differentiate host dependency link from regular host link (intel#4002) [SYCL][ESIMD] Support device half type in intrinsics. (intel#4024) [SYCL] Allow fpga_reg only for PODs and Trivially-copyable structs (intel#3643) [SYCL][FPGA] Restore legacy debug info version for the hardware (intel#3991) [SYCL][PI][L0] Force reset of memcpy command-list. (intel#4001) ...
…l#336) Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx. The triple has been added. nan.cpp fails for the fp64 case that was switched on by (intel/llvm#3950). Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
…l/llvm-test-suite#336) Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx. The triple has been added. nan.cpp fails for the fp64 case that was switched on by (intel#3950). Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
[SYCL][CUDA] atomic_ref.fetch_add used for fp64 (add operator) reduction if device has the atomic64 aspect.
The atomic64 device aspect has been added. Only the cuda backend is currently supported for the atomic64 device aspect.
SYCL2020 introduces the atomic64 aspect which is required for the use of atomic_ref.fetch_add with fp64 operand.
These changes allow devices with the atomic64 aspect to use a specialized reduction when using the add operator that makes use of atomics at the group level using atomic_ref.
If the atomic64 aspect is not available then the default existing implementation which does not use atomic operations is used.
Signed-off-by: JackAKirk jack.kirk@codeplay.com