Skip to content

[SYCL][CUDA] atomic_ref.fetch_add used for fp64 reduction if device.has(atomic64) #3950

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 11 commits into from
Jun 30, 2021

Conversation

JackAKirk
Copy link
Contributor

[SYCL][CUDA] atomic_ref.fetch_add used for fp64 (add operator) reduction if device has the atomic64 aspect.

The atomic64 device aspect has been added. Only the cuda backend is currently supported for the atomic64 device aspect.

SYCL2020 introduces the atomic64 aspect which is required for the use of atomic_ref.fetch_add with fp64 operand.
These changes allow devices with the atomic64 aspect to use a specialized reduction when using the add operator that makes use of atomics at the group level using atomic_ref.
If the atomic64 aspect is not available then the default existing implementation which does not use atomic operations is used.

Signed-off-by: JackAKirk jack.kirk@codeplay.com

…ion when device has atomic64.

Only the cuda backend is currently supported for the atomic64 device aspect.

SYCL2020 introduces the atomic64 aspect which is required for the use of atomic_ref.fetch_add with fp64 operand.
These changes allow devices with the atomic64 aspect to use a specialized reduction when using the add operator that makes use of atomics at the group level.
If the atomic64 aspect is not available then the default existing implementation which does not use atomic operations is used.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk JackAKirk requested review from smaslov-intel and a team as code owners June 17, 2021 12:03
vladimirlaz
vladimirlaz previously approved these changes Jun 17, 2021
Copy link
Contributor

@vladimirlaz vladimirlaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve to let testing start

@vladimirlaz vladimirlaz self-requested a review June 17, 2021 13:02
@bader bader added the cuda CUDA back-end label Jun 21, 2021
@bader
Copy link
Contributor

bader commented Jun 21, 2021

@intel/llvm-reviewers-cuda, @smaslov-intel, ping.

@bader bader requested review from Pennycook and v-klochkov June 21, 2021 10:46
@@ -38,7 +38,8 @@ enum class aspect {
ext_intel_gpu_subslices_per_slice = 22,
ext_intel_gpu_eu_count_per_subslice = 23,
ext_intel_max_mem_bandwidth = 24,
ext_intel_mem_channel = 25
ext_intel_mem_channel = 25,
atomic64 = 26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, there are int64_base_atomics and int64_extended_atomics that overlap with atomic64. Shouldn't we deprecate int64_*_atomics aspects?

…plugin.

This change ensures that all Reduction tests run for the cuda backend.
The cl_khr_fp64 extension is used for all cuda devices by default, since sm_XX wherer XX < 13 has been unsupported by the cuda driver since cuda 8.0.
The test fp16-with-unnamed-lambda.cpp has been deleted because it has a duplicate in the test suite (in the dir SYCL/Regression).
In both cases the triple is missing on the first line which needs to be added to the llvm-test-suite copy to avoid a test failure now that the test is not skipped for the cuda backend.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk
Copy link
Contributor Author

The last commit also means that the llvm-test-suite/SYCL/DeviceLib/built-ins/nan.cpp test which was previously skipped now runs for cuda. This test fails the fp64 case. I will open a PR in the test suite which marks the cuda case as XFAIL. I will also correct the missing triple in the first line of the fp16-with-unnamed-lambda.cpp test.

@bader
Copy link
Contributor

bader commented Jun 23, 2021

The last commit also means that the llvm-test-suite/SYCL/DeviceLib/built-ins/nan.cpp test which was previously skipped now runs for cuda. This test fails the fp64 case. I will open a PR in the test suite which marks the cuda case as XFAIL. I will also correct the missing triple in the first line of the fp16-with-unnamed-lambda.cpp test.

I suggest moving fp16 related changes to a separate pull request. Current PR title and description mention 64-bit data types only.

@JackAKirk
Copy link
Contributor Author

The last commit also means that the llvm-test-suite/SYCL/DeviceLib/built-ins/nan.cpp test which was previously skipped now runs for cuda. This test fails the fp64 case. I will open a PR in the test suite which marks the cuda case as XFAIL. I will also correct the missing triple in the first line of the fp16-with-unnamed-lambda.cpp test.

I suggest moving fp16 related changes to a separate pull request. Current PR title and description mention 64-bit data types only.

Yes you are right, thanks. I'll revert the fp16 change and open a separate PR.

@bader
Copy link
Contributor

bader commented Jun 23, 2021

@intel/llvm-reviewers-cuda, @smaslov-intel, @intel/llvm-reviewers-runtime, @v-klochkov, @Pennycook, can you take a look, please?

JackAKirk added 2 commits June 23, 2021 10:50
This change allows a skipped fp64 Reduction test in llvm-test-suite (reduction_nd_ext_double.cpp) to run for the cuda backend.
The cl_khr_fp64 extension is used for all cuda devices by default, since sm_XX wherer XX < 13 has been unsupported by the cuda driver since cuda 8.0.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
bader
bader previously approved these changes Jun 23, 2021
Pennycook
Pennycook previously approved these changes Jun 23, 2021
Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a definition of "atomic64" extension in SYCL2020 approved way

Renamed has_atomic_add_float to has_atomic_add_float64, since the general usage that includes float32 is only expected to be temporary.
has_atomic_add_float64 is a pseudonym of IsReduOptForAtomic64Add.
Updated documentation describing the current temporary usage of fp32 within IsReduOptForAtomic64Add.
IsReduOptForFastFloatAtomicAdd has been renamed IsReduOptForAtomic64Add to distinguish that this boolean should only be used in the case that the device has the sycl2020 atomic64 aspect, consistent with the naming convention used in other functions that are specializations for the atomic64 aspect.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk JackAKirk dismissed stale reviews from Pennycook and bader via ab8a600 June 24, 2021 10:47
@JackAKirk JackAKirk dismissed stale reviews from v-klochkov and Pennycook via 49ec72a June 28, 2021 13:12
@bader
Copy link
Contributor

bader commented Jun 28, 2021

@JackAKirk, there conflicts in sycl/include/CL/sycl/aspects.hpp. Could you merge tip of the sycl branch to reduction_atomic64_add branch to resolve them, please?

@JackAKirk
Copy link
Contributor Author

@JackAKirk, there conflicts in sycl/include/CL/sycl/aspects.hpp. Could you merge tip of the sycl branch to reduction_atomic64_add branch to resolve them, please?

Sure. Thanks for the reminder.

@bader bader requested a review from smaslov-intel June 28, 2021 14:56
Plugin.call_nocheck casts result to false if PI_SUCCESS was not returned.

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@bader bader requested a review from smaslov-intel June 29, 2021 08:58
bader
bader previously approved these changes Jun 29, 2021
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
smaslov-intel
smaslov-intel previously approved these changes Jun 29, 2021
Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

@bader bader merged commit 544fb7c into intel:sycl Jun 30, 2021
JackAKirk pushed a commit to JackAKirk/llvm-test-suite that referenced this pull request Jun 30, 2021
Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx.  The triple has been added.
nan.cpp fails for the fp64 case that was switched on by (intel/llvm#3950).

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@bader
Copy link
Contributor

bader commented Jun 30, 2021

@JackAKirk, this patch regressed SYCL/DeviceLib/built-ins/nan.cpp from intel/llvm-test-suite.

@JackAKirk
Copy link
Contributor Author

@JackAKirk, this patch regressed SYCL/DeviceLib/built-ins/nan.cpp from intel/llvm-test-suite.

Yes, there is a test case in nan.cpp that requires the fp64 device aspect and was previously skipped for the cuda backend since fp64 was previously set always to false. Now the cuda device has the fp64 aspect by default following this PR being merged, so the test is run. This PR should not have affected anything else in the nan.cpp test. I have set the nan.cpp test to XFAIL for cuda in the following llvm-test-suite PR: intel/llvm-test-suite#336

vladimirlaz pushed a commit to intel/llvm-test-suite that referenced this pull request Jul 1, 2021
Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx.  The triple has been added.
nan.cpp fails for the fp64 case that was switched on by (intel/llvm#3950).

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
alexbatashev added a commit to alexbatashev/llvm that referenced this pull request Jul 2, 2021
* upstream/sycl: (649 commits)
  [SYCL][Driver][NFC] Update integration footer test for 32-bit host (intel#4039)
  [SYCL][L0] Initialize descriptor .stype and .pNext (intel#4032)
  [SYCL] Add sycl::kernel::get_kernel_bundle method (intel#3855)
  [SYCL] Add support for device UUID as a SYCL extension. (intel#3696)
  [SYCL][Matrix] Add spec document for the matrix extension interface and its first implementation for AMX (intel#3551)
  Fix debug build mangler test after PR#3992 (8f38045). (intel#4033)
  [Driver][SYCL] Restrict user -include file in final integration footer step (intel#4036)
  [SYCL] [Tests] Do not copy device binary image mocks (intel#4023)
  [SYCL][Doc] Update docs to reflect new compiler features (intel#4030)
  [SYCL][CUDA] cl_khr_fp16 extension connected to cuda PI. (intel#4029)
  [SYCL][NFC] Refactor RT unit tests (intel#4021)
  [SYCL] Switch to using integration footer by default (intel#3777)
  [SYCL][CUDA] Add the Use Default Stream property (intel#4004)
  Uplift GPU RT version for Linux to 21.24.20098 (intel#4003)
  [SYCL][CUDA] atomic_ref.fetch_add used for fp64 reduction if device.has(atomic64) (intel#3950)
  [Driver][SYCL] Differentiate host dependency link from regular host link (intel#4002)
  [SYCL][ESIMD] Support device half type in intrinsics. (intel#4024)
  [SYCL] Allow fpga_reg only for PODs and Trivially-copyable structs (intel#3643)
  [SYCL][FPGA] Restore legacy debug info version for the hardware (intel#3991)
  [SYCL][PI][L0] Force reset of memcpy command-list. (intel#4001)
  ...
smaslov-intel pushed a commit to smaslov-intel/llvm-test-suite that referenced this pull request Aug 12, 2021
…l#336)

Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx.  The triple has been added.
nan.cpp fails for the fp64 case that was switched on by (intel/llvm#3950).

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…l/llvm-test-suite#336)

Now that the fp16 aspect is connected to the cuda PI (https://github.com/intel/llvm/pull/4029/files) one test case (fp16-with-unnamed-lambda.cpp) that now runs for cuda if the device has the fp16 aspect failed because it was missing the triple for ptx.  The triple has been added.
nan.cpp fails for the fp64 case that was switched on by (intel#3950).

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants