-
Notifications
You must be signed in to change notification settings - Fork 973
Resources
Matthew Nicely edited this page Dec 11, 2022
·
1 revision
We have also described the structure of an efficient GEMM in our talk at the GPU Technology Conference 2018.
- CUTLASS: Software Primitives for Dense Linear Algebra at All Levels and Scales within CUDA
- Developing CUDA Kernels to Push Tensor Cores to the Absolute Limit on NVIDIA A100
- Accelerating Convolution with Tensor Cores in CUTLASS
- Accelerating Backward Data Gradient by Increasing Tensor Core Utilization in CUTLASS
- CUTLASS: Python API, Enhancements, and NVIDIA Hopper