-
Notifications
You must be signed in to change notification settings - Fork 12.2k
CUDA: add bf16 and f32 support to cublas_mul_mat_batched #14361
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: master
Are you sure you want to change the base?
Conversation
79ca9fd
to
789c697
Compare
fe14807
to
b7225ec
Compare
mul_mat_batched with bf16 is failing for |
2b83788
to
87aeacf
Compare
@JohannesGaessler mul-mat tests in bf16 which fail for vulkan because of an assert |
Sorry, I didn't see the Vulkan comment. The problem from what I can tell is that the logic in |
I think this was supposed to work, but just changing the assert I see the test fail. I'll debug it. |
#14378 should fix the new tests. |
Add bf16 and f32 to support batched cuBLAS mul mat. Speed up when we do
--cache_type_v bf16 --cache_type_k bf16
when running llama-bench