Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

cuBLAS error 15 at ggml-cuda.cu:7548: the requested functionality is not supported #1587

Closed
themanyone opened this issue Dec 3, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@themanyone
Copy link

The cuBLAS build compiles but does not work.

It seems to be related to issue 1447. But when I run the executable, I get a different error.

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 

main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

cuBLAS error 15 at ggml-cuda.cu:7548: the requested functionality is not supported
current device: 0

I traced it to the following CUBLAS_CHECK. I can probably just comment it out in the code. Will try that later. I got other stu** to do.

    if (r2 == 1 && r3 == 1 && src0->nb[2]*src0->ne[2] == src0->nb[3] && src1->nb[2]*src1->ne
[2] == src1->nb[3]) {
        // there is no broadcast and src0, src1 are contiguous across dims 2, 3
        // use cublasGemmStridedBatchedEx
        CUBLAS_CHECK(
        cublasGemmStridedBatchedEx(g_cublas_handles[id], CUBLAS_OP_T, CUBLAS_OP_N,
                ne01, ne11, ne10,
                &alpha_f16, (const char *) src0_as_f16, CUDA_R_16F, nb01/sizeof(half),  src0
->nb[2]/sizeof(half),  // strideA
                            (const char *) src1_as_f16, CUDA_R_16F, nb11/sizeof(float), src1
->nb[2]/sizeof(float), // strideB
                &beta_f16,  (	   char *)     dst_f16, CUDA_R_16F, ne01,		 dst
->nb[2]/sizeof(float), // strideC
                ne12*ne13,
                CUBLAS_COMPUTE_16F,
                CUBLAS_GEMM_DEFAULT_TENSOR_OP));
    } else {
        // use cublasGemmBatchedEx
        const int ne23 = ne12*ne13;

Supplementary system info.

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:16:49_PDT_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0

nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro M3000M                  Off | 00000000:01:00.0  On |                  N/A |
| N/A   66C    P0              31W /  75W |   1356MiB /  4096MiB |     95%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1241      G   /usr/libexec/Xorg                           152MiB |
|    0   N/A  N/A      2206      G   /usr/lib64/firefox/firefox                  209MiB |
|    0   N/A  N/A     57120      C   python                                      985MiB |
+---------------------------------------------------------------------------------------+
@bobqianic
Copy link
Collaborator

CublasGemmStridedBatchedEx requires a GPU architecture with capabilities of 5.0 or higher. It's strange, though, because the Quadro M3000M has a compute capacity of 5.0, so this error shouldn't be occurring.

image

@bobqianic bobqianic added the bug Something isn't working label Dec 3, 2023
@themanyone
Copy link
Author

Youchat has this to say re. cuBLAS Error Code 15

The error code 15 in cuBLAS is associated with the CUBLAS_STATUS_NOT_INITIALIZED error. This error typically occurs when attempting to use cuBLAS without initializing it properly. Here are some snippets from the search results that provide information about this error:

From Source [1]](https://github.com/PaddlePaddle/PaddleOCR/issues/9084), an OSError: (External) CUBLAS error(15) is mentioned, with a hint to search for the error code(15) on the NVIDIA cuBLAS documentation website.
Source also mentions the CUBLAS_STATUS_NOT_INITIALIZED error in a comment related to failed to create cublas handle.

Based on the provided search snippets, it is clear that the error code 15 in cuBLAS is related to the CUBLAS_STATUS_NOT_INITIALIZED error, indicating a failure to initialize cuBLAS properly.

@themanyone
Copy link
Author

themanyone commented Dec 3, 2023

If I run with ./main -ng flag, it works. And...strangely. It is much faster and does use some of the GPU.
About 132Mb, as shown by nvidia-smi.

Verified again. Running cuBLAS-enabled version with -ng flag is indeed over 5x faster than then one compiled without cuBLAS support.

@themanyone
Copy link
Author

themanyone commented Dec 5, 2023

I neglected to mention that I had to modify Makefile

NVCCFLAGS = -allow-unsupported-compiler ...

Because of this:

/usr/local/cuda/include/crt/host_config.h:143:2: error: #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
  143 | #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

And also, I was using CUDA_ARCH_FLAG=compute_50 due to arch=native not working in that build. I just did a git pull however, and this change is no longer required.

The bug is still present in the latest pull.
cuBLAS error 15 at ggml-cuda.cu:7548: the requested functionality is not supported
current device: 0

The -ng flag still works around the bug like before.

@pjuhasz
Copy link

pjuhasz commented Dec 27, 2023

I have the exact same issue: the ./main program crashes with same error message (expect now it refers to ggml-cuda.cu:8456 with git version 37a70).

I have a Quadro M2000M in a Thinkpad P50.

I can also confirm that the same cuBLAS-enabled executable does not crash with the -ng switch, and that it uses the GPU to some extent and it is faster than the regular CPU-only binary (except on my machine the speedup is only 2-3x).

@pjuhasz
Copy link

pjuhasz commented Dec 27, 2023

Possibly related: ggerganov/llama.cpp#4395

@bobqianic
Copy link
Collaborator

I can also confirm that the same cuBLAS-enabled executable does not crash with the -ng switch, and that it uses the GPU to some extent and it is faster than the regular CPU-only binary (except on my machine the speedup is only 2-3x).

See #1688 (comment)

@themanyone
Copy link
Author

The error, as well as the need for -ng workaround, are fixed as of release : v1.5.3. Closing.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants