Retire the ggml_mul_mat() branch for transposed src0 #500

ggerganov · 2023-03-25T15:59:58Z

It can always be made contiguous with ggml_cpy()
The code is now simplified
The results are deterministic in respect to num threads
Also added ARM_NEOM implementation of dequantize_row_q4_0()

- It can always be made contiguous with ggml_cpy() - The code is now simplified - The results are deterministic in respect to num threads

* Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON * Fix dequantization - forgot to interleave the quants

…g#500)" This reverts commit ecbe466.

ggerganov added 2 commits March 25, 2023 17:59

Retire the ggml_mul_mat() for transposed src0

1e39d2b

- It can always be made contiguous with ggml_cpy() - The code is now simplified - The results are deterministic in respect to num threads

SIMD-ify dequantize_row_q4_0() for ARM_NEON (#502)

face808

* Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON * Fix dequantization - forgot to interleave the quants

ggerganov merged commit ecbe466 into master Mar 25, 2023

ggerganov deleted the simple-mul_mat branch March 25, 2023 17:47

sw added a commit to sw/llama.cpp that referenced this pull request Apr 4, 2023

Revert "Retire the ggml_mul_mat() branch for transposed src0 (ggml-or…

8fb772c

…g#500)" This reverts commit ecbe466.

sw added a commit to sw/llama.cpp that referenced this pull request Apr 4, 2023

Revert "Retire the ggml_mul_mat() branch for transposed src0 (ggml-or…

4b100d7

…g#500)" This reverts commit ecbe466.

sw mentioned this pull request Apr 4, 2023

Performance Discrepancy: gpt4all Faster than Optimized llama.cpp #603

Closed

sw added a commit to sw/llama.cpp that referenced this pull request Apr 4, 2023

Revert "Retire the ggml_mul_mat() branch for transposed src0 (ggml-or…

d1e461c

…g#500)" This reverts commit ecbe466.

sw added a commit to sw/llama.cpp that referenced this pull request Apr 4, 2023

Revert "Retire the ggml_mul_mat() branch for transposed src0 (ggml-or…

f747a43

…g#500)" This reverts commit ecbe466.

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retire the ggml_mul_mat() branch for transposed src0 #500

Retire the ggml_mul_mat() branch for transposed src0 #500

ggerganov commented Mar 25, 2023 •

edited

Loading

Retire the ggml_mul_mat() branch for transposed src0 #500

Retire the ggml_mul_mat() branch for transposed src0 #500

Conversation

ggerganov commented Mar 25, 2023 • edited Loading

ggerganov commented Mar 25, 2023 •

edited

Loading