Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[r2.18-rocm-enhanced] Use faster atomics on ROCM; Enable kernel argument preloading #2784

Open
wants to merge 2 commits into
base: r2.18-rocm-enhanced
Choose a base branch
from

Conversation

draganmladjenovic
Copy link

No description provided.

@@ -783,7 +822,7 @@ CREATE_CUDA_DEVICE_FUNCTION_ALIAS(GpuAtomicAdd, CudaAtomicAdd);
// GpuAtomicSub
template <typename T, typename U>
__device__ detail::ToTypeIfConvertible<U, T> GpuAtomicSub(T* ptr, U value) {
return atomicSub(ptr, value);
return atomicSub(detail::AddressSpaceHint<1>(ptr), value);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may I ask why atomicAdd(detail::AddressSpaceHint<3>(dst), val); is using shared memory while atomicSub(detail::AddressSpaceHint<1>(ptr), value); is using global memory?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddressSpaceHint<3> is only for GpuAtomicAddShared that is used by rocm specific kernels.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is no GpuAtomicSubShared, which would use addr space 3 if it existed. I'm not sure why it doesn't exist since sub just calls add anyway.

Copy link

@i-chaochen i-chaochen Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why AddressSpaceHint<3> is not used by atomicSub?

atomicSub is rocm specific kernel as well I think? https://rocm.docs.amd.com/projects/HIP/en/docs-5.7.0/reference/kernel_language.html#atomic-functions

int atomicSub(int* address, int val)

@jayfurmanek jayfurmanek changed the title R2.18 rocm enhanced bd [r2.18-rocm-enhanced] Use faster atomics on ROCM; Enable kernel argument preloading Dec 16, 2024
@amd-jianli12 amd-jianli12 force-pushed the r2.18-rocm-enhanced-bd branch from 21438d5 to 891c0d2 Compare January 8, 2025 01:16
@amd-jianli12 amd-jianli12 force-pushed the r2.18-rocm-enhanced-bd branch from 891c0d2 to 9174f23 Compare January 8, 2025 06:34
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants