chore: `Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib #635

polarathene · 2025-06-13T06:25:06Z

What does this PR do?

Pruning cuBLAS for CC 7.5 now also retains sm_70 in addition to the sm_75 target. See #610 (comment) for more information.

polarathene · 2025-06-13T06:25:10Z

NOTE: There is no known need to do this for TEI, however Nvidia encourages retaining the major CC and any minors in-between when using nvprune on cuBLAS.

Feel free to close the PR if you prefer to avoid until there's a relevant bug report. My understanding is it should only be an issue when using a kernel from cuBLAS that would defer to sm_70 when it'd have been equivalent for sm_75.

For example in the current base image used to build, sm_70 has 184 cubins vs sm_75 containing only 8:

$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -oE '\.sm_70.*\.' | wc -l
184

$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -oE '\.sm_75.*\.' | wc -l
8

# Individual cubins:
$ cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a | grep -E '\.sm_75.*\.'
ELF file    5: libcublas_static.5.sm_75.cubin
ELF file   13: libcublas_static.13.sm_75.cubin
ELF file   21: libcublas_static.21.sm_75.cubin
ELF file   29: libcublas_static.29.sm_75.cubin
ELF file   37: libcublas_static.37.sm_75.cubin
ELF file   45: libcublas_static.45.sm_75.cubin
ELF file   53: libcublas_static.53.sm_75.cubin
ELF file   61: libcublas_static.61.sm_75.cubin

I'm not entirely sure why the minor CC versions in-between (when present) might matter to be retained.

The concern does not apply to the other two supported real archs handled via nvprune as sm_80 is already provided, while sm_90 does not target anything newer (since it's the only arch for that CC major):

text-embeddings-inference/Dockerfile-cuda

Lines 57 to 60 in 53eae1b

    
               nvprune --generate-code code=sm_80 --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \ 
        
           elif [ ${CUDA_COMPUTE_CAP} -eq 90 ]; \ 
        
           then  \ 
        
               nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \

chore: Dockerfile-cuda - Pruning cuBLAS should retain major CC

1681b19

polarathene mentioned this pull request Jun 13, 2025

Update Docker images to latest Ubuntu version #610

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: `Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib #635

chore: `Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib #635

polarathene commented Jun 13, 2025

Uh oh!

polarathene commented Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

chore: Dockerfile-cuda - Retain major CC when pruning static cuBLAS lib #635

Are you sure you want to change the base?

chore: Dockerfile-cuda - Retain major CC when pruning static cuBLAS lib #635

Conversation

polarathene commented Jun 13, 2025

What does this PR do?

Uh oh!

polarathene commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

chore: `Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib #635

chore: `Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib #635

polarathene commented Jun 13, 2025 •

edited

Loading