-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Any chance of adding Clblast support? #1059
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
If the authors of Koboldcpp want to contribute that change I see no reason why it wouldn't be accepted, but the licenses are incompatible so we cannot just take their code and merge it here. |
Also, if there is any performance benefit or code simplification thanks to adding CLBlast. |
Quoting from clblast github readme (emphasis mine)
Nvidia CUBLAS support is amazing, but it doesn't add anything for people with embedded devices (like phones or raspberry PIs) or AMD GPUs. I only agree it would not be worth adding if it has no benefit over using openBLAS, but I doubt that. Don't have any numbers to back that up though. |
Doing a quick and dirty comparison between llama.cpp with OpenBLAS and koboldcpp with CLBlast.
CLBlast processing time for dan.txt
Keep in mind this is on a RX 570 and not a high end card. There are more numbers here but I can't verify the hardware it was performed on. cuBLAS gains at that point seem to be comparable to CLBlast, however CLBlast in that instance already did dequantization on the GPU, while cuBLAS didn't at the time those numbers were done. So the gap is probably larger now between the two. |
I wrote the ClBlast code for koboldcpp. If there's interest here, it should be easy to port. I could open a PR. |
I have been playing with it on my RISC V VisionFive 2 SBC and would like to see it incorporated and improved here. |
@0cc4m please do. I think it would be a good idea to have that functionality in upstream llama.cpp rather than as a Kobold-exclusive feature. On the performance side another user is reporting 50% gains with a Nvidia 3060 on the clblast Kobold code. Granted clblast is twice as slow as OpenBLAS on my hardware but I'm using an integrated Intel HD530. While AMD and Nvidia users are likely better off with HIPBLAS or CUBLAS those with older AMD or Intel GPUs are stuck with clblast if they want to hardware offload. |
The current cuBLAS integration is very basic (awesome work to get it in, just far from being the boost it could be), we might want to choose one path at one point if the implementation is not very compatible. I don't know much about clBlast but it's open source while cuBLAS is fully closed sourced. If the dot product performance is compareable it's probably the better choice. |
I basically have a working version, the mul_mat speed is only slightly slower than CuBLAS. Here are some tests of it on my RTX 3060. The numbers at the end of the lines are m, n and k.
CLBlast:
However, I am observing some very strange behavior. In total CLBlast is still much slower than CuBLAS: CuBLAS: But when I dig deeper, I find that building with CuBLAS enabled seems to speed up entirely unrelated operations massively. I have so far not found any reason for this. Here is some data,
@slaren Do you have any idea what causes the CuBLAS version to accelerate for example the RMS Norm by a factor of 10, even though this function is entirely run on the CPU? Even on my CLBlast version it is slightly faster than OpenBLAS, but that is a small enough difference to just be run-to-run variance. Is there something wrong with the perf measurements of the library? |
I don't see any reason for that, I would be inclined to believe that there is a measurement error somewhere. |
@slaren @0cc4m we've solved the issue - apparently there was code in the llama.cpp file that made the graph switch to single threaded mode during BLAS calculations - understandable for OpenBLAS but unnecessary for GPU accelerated approaches. When CuBLAS was added, Now that it is fixed, CLBlast performs nearly on par with CuBLAS (still slightly slower). Edit: If anyone wants to try out our implementation, check out KoboldCpp |
This is great news! Not everyone has nvidia cards, and OpenCL supports even fossilized dinosaur bones. |
Since the latest release added support for cuBLAS, is there any chance of adding Clblast?
Koboldcpp (which, as I understand, also uses llama.cpp) already has it, so it shouldn't be that hard.
The text was updated successfully, but these errors were encountered: