[SYCL][CUDA] atomic_ref.fetch_add used for fp64 reduction if device.has(atomic64) #3950

JackAKirk · 2021-06-17T12:03:00Z

[SYCL][CUDA] atomic_ref.fetch_add used for fp64 (add operator) reduction if device has the atomic64 aspect.

The atomic64 device aspect has been added. Only the cuda backend is currently supported for the atomic64 device aspect.

SYCL2020 introduces the atomic64 aspect which is required for the use of atomic_ref.fetch_add with fp64 operand.
These changes allow devices with the atomic64 aspect to use a specialized reduction when using the add operator that makes use of atomics at the group level using atomic_ref.
If the atomic64 aspect is not available then the default existing implementation which does not use atomic operations is used.

Signed-off-by: JackAKirk jack.kirk@codeplay.com

…ion when device has atomic64. Only the cuda backend is currently supported for the atomic64 device aspect. SYCL2020 introduces the atomic64 aspect which is required for the use of atomic_ref.fetch_add with fp64 operand. These changes allow devices with the atomic64 aspect to use a specialized reduction when using the add operator that makes use of atomics at the group level. If the atomic64 aspect is not available then the default existing implementation which does not use atomic operations is used. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

vladimirlaz

Approve to let testing start

bader · 2021-06-21T10:31:20Z

@intel/llvm-reviewers-cuda, @smaslov-intel, ping.

sycl/include/CL/sycl/detail/pi.h

bader · 2021-06-21T10:37:36Z

sycl/include/CL/sycl/aspects.hpp

@@ -38,7 +38,8 @@ enum class aspect {
  ext_intel_gpu_subslices_per_slice = 22,
  ext_intel_gpu_eu_count_per_subslice = 23,
  ext_intel_max_mem_bandwidth = 24,
-  ext_intel_mem_channel = 25
+  ext_intel_mem_channel = 25,
+  atomic64 = 26


Technically, there are int64_base_atomics and int64_extended_atomics that overlap with atomic64. Shouldn't we deprecate int64_*_atomics aspects?

…plugin. This change ensures that all Reduction tests run for the cuda backend. The cl_khr_fp64 extension is used for all cuda devices by default, since sm_XX wherer XX < 13 has been unsupported by the cuda driver since cuda 8.0. The test fp16-with-unnamed-lambda.cpp has been deleted because it has a duplicate in the test suite (in the dir SYCL/Regression). In both cases the triple is missing on the first line which needs to be added to the llvm-test-suite copy to avoid a test failure now that the test is not skipped for the cuda backend. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk · 2021-06-23T09:16:25Z

The last commit also means that the llvm-test-suite/SYCL/DeviceLib/built-ins/nan.cpp test which was previously skipped now runs for cuda. This test fails the fp64 case. I will open a PR in the test suite which marks the cuda case as XFAIL. I will also correct the missing triple in the first line of the fp16-with-unnamed-lambda.cpp test.

bader · 2021-06-23T09:32:22Z

The last commit also means that the llvm-test-suite/SYCL/DeviceLib/built-ins/nan.cpp test which was previously skipped now runs for cuda. This test fails the fp64 case. I will open a PR in the test suite which marks the cuda case as XFAIL. I will also correct the missing triple in the first line of the fp16-with-unnamed-lambda.cpp test.

I suggest moving fp16 related changes to a separate pull request. Current PR title and description mention 64-bit data types only.

JackAKirk · 2021-06-23T09:36:57Z

The last commit also means that the llvm-test-suite/SYCL/DeviceLib/built-ins/nan.cpp test which was previously skipped now runs for cuda. This test fails the fp64 case. I will open a PR in the test suite which marks the cuda case as XFAIL. I will also correct the missing triple in the first line of the fp16-with-unnamed-lambda.cpp test.

I suggest moving fp16 related changes to a separate pull request. Current PR title and description mention 64-bit data types only.

Yes you are right, thanks. I'll revert the fp16 change and open a separate PR.

bader · 2021-06-23T09:40:37Z

@intel/llvm-reviewers-cuda, @smaslov-intel, @intel/llvm-reviewers-runtime, @v-klochkov, @Pennycook, can you take a look, please?

…he cuda plugin." This reverts commit 80c8508.

This change allows a skipped fp64 Reduction test in llvm-test-suite (reduction_nd_ext_double.cpp) to run for the cuda backend. The cl_khr_fp64 extension is used for all cuda devices by default, since sm_XX wherer XX < 13 has been unsupported by the cuda driver since cuda 8.0. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

sycl/include/CL/sycl/ONEAPI/reduction.hpp

sycl/include/CL/sycl/info/info_desc.hpp

sycl/source/detail/device_impl.cpp

smaslov-intel

Please add a definition of "atomic64" extension in SYCL2020 approved way

sycl/include/CL/sycl/ONEAPI/reduction.hpp

Renamed has_atomic_add_float to has_atomic_add_float64, since the general usage that includes float32 is only expected to be temporary. has_atomic_add_float64 is a pseudonym of IsReduOptForAtomic64Add. Updated documentation describing the current temporary usage of fp32 within IsReduOptForAtomic64Add. IsReduOptForFastFloatAtomicAdd has been renamed IsReduOptForAtomic64Add to distinguish that this boolean should only be used in the case that the device has the sycl2020 atomic64 aspect, consistent with the naming convention used in other functions that are specializations for the atomic64 aspect. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

bader · 2021-06-28T13:16:14Z

@JackAKirk, there conflicts in sycl/include/CL/sycl/aspects.hpp. Could you merge tip of the sycl branch to reduction_atomic64_add branch to resolve them, please?

JackAKirk · 2021-06-28T13:18:21Z

@JackAKirk, there conflicts in sycl/include/CL/sycl/aspects.hpp. Could you merge tip of the sycl branch to reduction_atomic64_add branch to resolve them, please?

Sure. Thanks for the reminder.

sycl/source/detail/device_info.hpp

Plugin.call_nocheck casts result to false if PI_SUCCESS was not returned. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

sycl/source/detail/device_info.hpp

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

sycl/source/detail/device_info.hpp

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

smaslov-intel

Thanks! LGTM.

sycl/include/CL/sycl/info/info_desc.hpp

Remove comments.

Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx. The triple has been added. nan.cpp fails for the fp64 case that was switched on by (intel/llvm#3950). Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

bader · 2021-06-30T15:32:43Z

@JackAKirk, this patch regressed SYCL/DeviceLib/built-ins/nan.cpp from intel/llvm-test-suite.

JackAKirk · 2021-06-30T16:17:26Z

@JackAKirk, this patch regressed SYCL/DeviceLib/built-ins/nan.cpp from intel/llvm-test-suite.

Yes, there is a test case in nan.cpp that requires the fp64 device aspect and was previously skipped for the cuda backend since fp64 was previously set always to false. Now the cuda device has the fp64 aspect by default following this PR being merged, so the test is run. This PR should not have affected anything else in the nan.cpp test. I have set the nan.cpp test to XFAIL for cuda in the following llvm-test-suite PR: intel/llvm-test-suite#336

Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx. The triple has been added. nan.cpp fails for the fp64 case that was switched on by (intel/llvm#3950). Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

* upstream/sycl: (649 commits) [SYCL][Driver][NFC] Update integration footer test for 32-bit host (intel#4039) [SYCL][L0] Initialize descriptor .stype and .pNext (intel#4032) [SYCL] Add sycl::kernel::get_kernel_bundle method (intel#3855) [SYCL] Add support for device UUID as a SYCL extension. (intel#3696) [SYCL][Matrix] Add spec document for the matrix extension interface and its first implementation for AMX (intel#3551) Fix debug build mangler test after PR#3992 (8f38045). (intel#4033) [Driver][SYCL] Restrict user -include file in final integration footer step (intel#4036) [SYCL] [Tests] Do not copy device binary image mocks (intel#4023) [SYCL][Doc] Update docs to reflect new compiler features (intel#4030) [SYCL][CUDA] cl_khr_fp16 extension connected to cuda PI. (intel#4029) [SYCL][NFC] Refactor RT unit tests (intel#4021) [SYCL] Switch to using integration footer by default (intel#3777) [SYCL][CUDA] Add the Use Default Stream property (intel#4004) Uplift GPU RT version for Linux to 21.24.20098 (intel#4003) [SYCL][CUDA] atomic_ref.fetch_add used for fp64 reduction if device.has(atomic64) (intel#3950) [Driver][SYCL] Differentiate host dependency link from regular host link (intel#4002) [SYCL][ESIMD] Support device half type in intrinsics. (intel#4024) [SYCL] Allow fpga_reg only for PODs and Trivially-copyable structs (intel#3643) [SYCL][FPGA] Restore legacy debug info version for the hardware (intel#3991) [SYCL][PI][L0] Force reset of memcpy command-list. (intel#4001) ...

…l#336) Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx. The triple has been added. nan.cpp fails for the fp64 case that was switched on by (intel/llvm#3950). Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

…l/llvm-test-suite#336) Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx. The triple has been added. nan.cpp fails for the fp64 case that was switched on by (intel#3950). Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk requested review from smaslov-intel and a team as code owners June 17, 2021 12:03

vladimirlaz previously approved these changes Jun 17, 2021

View reviewed changes

vladimirlaz self-requested a review June 17, 2021 13:02

bader added the cuda CUDA back-end label Jun 21, 2021

bader requested review from Pennycook and v-klochkov June 21, 2021 10:46

bader reviewed Jun 21, 2021

View reviewed changes

JackAKirk dismissed vladimirlaz’s stale review via 80c8508 June 23, 2021 09:11

JackAKirk added 2 commits June 23, 2021 10:50

Revert "cl_khr_fp64 and cl_khr_fp16 extensions are now connected to t…

b1a8a7d

…he cuda plugin." This reverts commit 80c8508.

bader previously approved these changes Jun 23, 2021

View reviewed changes

Pennycook reviewed Jun 23, 2021

View reviewed changes

sycl/include/CL/sycl/ONEAPI/reduction.hpp Outdated Show resolved Hide resolved

Pennycook reviewed Jun 23, 2021

View reviewed changes

sycl/include/CL/sycl/ONEAPI/reduction.hpp Outdated Show resolved Hide resolved

Pennycook previously approved these changes Jun 23, 2021

View reviewed changes

smaslov-intel reviewed Jun 23, 2021

View reviewed changes

sycl/include/CL/sycl/info/info_desc.hpp Show resolved Hide resolved

smaslov-intel reviewed Jun 23, 2021

View reviewed changes

sycl/source/detail/device_impl.cpp Show resolved Hide resolved

smaslov-intel suggested changes Jun 23, 2021

View reviewed changes

v-klochkov reviewed Jun 24, 2021

View reviewed changes

sycl/include/CL/sycl/ONEAPI/reduction.hpp Outdated Show resolved Hide resolved

JackAKirk dismissed stale reviews from Pennycook and bader via ab8a600 June 24, 2021 10:47

JackAKirk dismissed stale reviews from v-klochkov and Pennycook via 49ec72a June 28, 2021 13:12

Merge branch 'sycl' into reduction_atomic64_add

3b4bc5b

bader requested a review from smaslov-intel June 28, 2021 14:56

smaslov-intel reviewed Jun 28, 2021

View reviewed changes

sycl/source/detail/device_info.hpp Outdated Show resolved Hide resolved

Removed unnecessary screen print out.

b4bde0c

Plugin.call_nocheck casts result to false if PI_SUCCESS was not returned. Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

bader requested a review from smaslov-intel June 29, 2021 08:58

bader previously approved these changes Jun 29, 2021

View reviewed changes

smaslov-intel reviewed Jun 29, 2021

View reviewed changes

sycl/source/detail/device_info.hpp Outdated Show resolved Hide resolved

Return false if Err == PI_INVALID_VALUE.

6a0f948

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

JackAKirk dismissed bader’s stale review via 6a0f948 June 29, 2021 15:55

bader requested a review from smaslov-intel June 29, 2021 16:00

smaslov-intel reviewed Jun 29, 2021

View reviewed changes

sycl/source/detail/device_info.hpp Outdated Show resolved Hide resolved

Swapped (Err == PI_INVALID_VALUE) with (Err != PI_SUCCESS).

24ee26e

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

smaslov-intel previously approved these changes Jun 29, 2021

View reviewed changes

bader reviewed Jun 30, 2021

View reviewed changes

sycl/include/CL/sycl/info/info_desc.hpp Outdated Show resolved Hide resolved

Apply suggestions from code review

471ccde

Remove comments.

bader dismissed smaslov-intel’s stale review via 471ccde June 30, 2021 05:21

bader merged commit 544fb7c into intel:sycl Jun 30, 2021

JackAKirk mentioned this pull request Jun 30, 2021

[SYCL][CUDA] Corrected a missing triple and set an XFAIL: cuda. intel/llvm-test-suite#336

Merged

[SYCL][CUDA] atomic_ref.fetch_add used for fp64 reduction if device.has(atomic64) #3950

[SYCL][CUDA] atomic_ref.fetch_add used for fp64 reduction if device.has(atomic64) #3950

Uh oh!

Conversation

JackAKirk commented Jun 17, 2021

Uh oh!

vladimirlaz left a comment

Choose a reason for hiding this comment

Uh oh!

bader commented Jun 21, 2021

Uh oh!

Uh oh!

bader Jun 21, 2021

Choose a reason for hiding this comment

Uh oh!

JackAKirk commented Jun 23, 2021

Uh oh!

bader commented Jun 23, 2021

Uh oh!

JackAKirk commented Jun 23, 2021

Uh oh!

bader commented Jun 23, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smaslov-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bader commented Jun 28, 2021

Uh oh!

JackAKirk commented Jun 28, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smaslov-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bader commented Jun 30, 2021

Uh oh!

JackAKirk commented Jun 30, 2021

Uh oh!

Uh oh!