ggml : add AVX support based on AVX2 code #1430

katsu560 · 2023-05-13T13:47:55Z

I rereopen new PR on latest master.
I added AVX support based on AVX2 code to below functions.
static inline __m256i bytes_from_bits_32(const uint8_t * x)
static inline __m256i bytes_from_nibbles_32(const uint8_t * rsi)
static inline __m256 sum_i16_pairs_float(const __m128i xh, const __m128i xl)
static inline __m256 mul_sum_i8_pairs_float(const __m256i x, const __m256i y)
static void ggml_vec_dot_q4_1_q8_1(const int n, float * restrict s, const void * restrict vx, const void * restrict vy)
static void ggml_vec_dot_q5_0_q8_0(const int n, float * restrict s, const void * restrict vx, const void * restrict vy)
static void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void * restrict vx, const void * restrict vy)
static void ggml_vec_dot_q8_0_q8_0(const int n, float * restrict s, const void * restrict vx, const void * restrict vy)

performance improved:
with q5_0 model: 1084790.97 ms -> 156907.19 ms
with q5_1 model: 1072326.98 ms -> 301364.32 ms

ggerganov and sw, please confirm this PR.

sw

@katsu560 Thank you, this is working fine.

However, for ggml_vec_dot_q4_1_q8_1 and ggml_vec_dot_q8_0_q8_0, only a single line is different for AVX and AVX2, I think those should be merged.

Also please learn to use Git and PRs and don't open a new PR for every modification on the same topic.

prusnak · 2023-05-13T17:42:04Z

only a single line is different for AVX and AVX2, I think those should be merged.

It seems like you should be able to use the following trick if only one line differs:

#elif defined(__AVX__) || defined(__AVX2__)
//
// common code for both AVX1 and AVX2
//
#if defined(__AVX__)
//
// code specific for AVX1
//
#else // defined(__AVX2__)
//
// code specific for AVX2
//
#endif
//
// common code for both AVX1 and AVX2
//
#elif ...

sw · 2023-05-13T20:20:25Z

I don't know if it's really worthwhile to add SIMD optimizations for the quantization of the Q4, Q5 formats. These were removed with #1405. I think we'd want optimized dot-product routines, and the Q8 quantizations.

ggerganov · 2023-05-13T21:15:21Z

I don't know if it's really worthwhile to add SIMD optimizations for the quantization of the Q4, Q5 formats

Yes, let's not complicate the implementation with Q4 and Q5 SIMD quantize / dequantize for now.
We can SIMD-ify these at a later stage, when we are confident which quantization formats will remain

…_0_q8_0

katsu560 · 2023-05-13T21:54:07Z

I pushed the code to merge AVX2/AVX code.
I'd like you to confirm and merge to your code.

Regarding the adding SIMD optimizations for the quantization of the Q4, Q5 formats,
I understood you withhold the merging these optimizations right now.

ggml : add AVX support based on AVX2 code

7b6f3f3

katsu560 mentioned this pull request May 13, 2023

ggml : add AVX support based on AVX2 code #1376

Closed

ggerganov requested a review from sw May 13, 2023 13:50

sw suggested changes May 13, 2023

View reviewed changes

ggml : add AVX support to quantize_row_q5_0, quantize_row_q5_1

61a3046

ggml : merge AVX2/AVX code in ggml_vec_dot_q4_1_q8_1, ggml_vec_dot_q8…

81b65da

…_0_q8_0

ggml : delete SIMD optimizations for the quantization of the Q5 format

262a757

katsu560 requested a review from sw May 14, 2023 09:38

sw approved these changes May 14, 2023

View reviewed changes

sw merged commit 60f8c36 into ggml-org:master May 14, 2023

Gnomesenpai mentioned this pull request Sep 3, 2024

llama-server <embedding> exited with status code -1 TabbyML/tabby#3056

Open

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : add AVX support based on AVX2 code #1430

ggml : add AVX support based on AVX2 code #1430

katsu560 commented May 13, 2023

sw left a comment •

edited

Loading

prusnak commented May 13, 2023

sw commented May 13, 2023

ggerganov commented May 13, 2023

katsu560 commented May 13, 2023

ggml : add AVX support based on AVX2 code #1430

ggml : add AVX support based on AVX2 code #1430

Conversation

katsu560 commented May 13, 2023

sw left a comment • edited Loading

Choose a reason for hiding this comment

prusnak commented May 13, 2023

sw commented May 13, 2023

ggerganov commented May 13, 2023

katsu560 commented May 13, 2023

sw left a comment •

edited

Loading