Skip to content

chore: Dockerfile-cuda - Retain major CC when pruning static cuBLAS lib #635

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

polarathene
Copy link

What does this PR do?

Pruning cuBLAS for CC 7.5 now also retains sm_70 in addition to the sm_75 target. See #610 (comment) for more information.

@polarathene
Copy link
Author

polarathene commented Jun 13, 2025

NOTE: There is no known need to do this for TEI, however Nvidia encourages retaining the major CC and any minors in-between when using nvprune on cuBLAS.


Feel free to close the PR if you prefer to avoid until there's a relevant bug report. My understanding is it should only be an issue when using a kernel from cuBLAS that would defer to sm_70 when it'd have been equivalent for sm_75.

For example in the current base image used to build, sm_70 has 184 cubins vs sm_75 containing only 8:

$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -oE '\.sm_70.*\.' | wc -l
184

$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -oE '\.sm_75.*\.' | wc -l
8

# Individual cubins:
$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -E '\.sm_75.*\.'
ELF file    5: libcublas_static.5.sm_75.cubin
ELF file   13: libcublas_static.13.sm_75.cubin
ELF file   21: libcublas_static.21.sm_75.cubin
ELF file   29: libcublas_static.29.sm_75.cubin
ELF file   37: libcublas_static.37.sm_75.cubin
ELF file   45: libcublas_static.45.sm_75.cubin
ELF file   53: libcublas_static.53.sm_75.cubin
ELF file   61: libcublas_static.61.sm_75.cubin

I'm not entirely sure why the minor CC versions in-between (when present) might matter to be retained.


The concern does not apply to the other two supported real archs handled via nvprune as sm_80 is already provided, while sm_90 does not target anything newer (since it's the only arch for that CC major):

nvprune --generate-code code=sm_80 --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \
elif [ ${CUDA_COMPUTE_CAP} -eq 90 ]; \
then \
nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant