Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add tuning option for shared memory size #61

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LutzCle
Copy link
Contributor

@LutzCle LutzCle commented Oct 15, 2021

Pascal and newer devices support a shared memory size larger than 48 KiB per thread group. This is an opt-in feature that was introduced in CUDA 9.0 by specifying the desired size using a launch function attribute.

The attributes necessary to opt-in are:

  • CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES
  • CU_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT

This PR adds tuning support to RustaCUDA as in this example:

// Get the maximum shared memory size
let max_shared_mem_bytes = device.get_attribute(DeviceAttribute::MaxSharedMemoryPerBlockOptin)? as u32;

// Set the function attribute
let function_name = std::ffi::CString::new(...).unwrap();
let mut function = module.get_function(&function_name)?;
function.set_max_dynamic_shared_size_bytes(max_shared_mem_bytes)?;

// Launch the kernel
unsafe { launch!( function<<<grid, block, max_shared_mem_bytes, stream>>>()).unwrap() };

For more information, see the CUDA documentation and the Pascal tuning guide.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant