-
Notifications
You must be signed in to change notification settings - Fork 11.5k
ggml : add AVX support based on AVX2 code #1430
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@katsu560 Thank you, this is working fine.
However, for ggml_vec_dot_q4_1_q8_1
and ggml_vec_dot_q8_0_q8_0
, only a single line is different for AVX and AVX2, I think those should be merged.
Also please learn to use Git and PRs and don't open a new PR for every modification on the same topic.
It seems like you should be able to use the following trick if only one line differs: #elif defined(__AVX__) || defined(__AVX2__)
//
// common code for both AVX1 and AVX2
//
#if defined(__AVX__)
//
// code specific for AVX1
//
#else // defined(__AVX2__)
//
// code specific for AVX2
//
#endif
//
// common code for both AVX1 and AVX2
//
#elif ... |
I don't know if it's really worthwhile to add SIMD optimizations for the quantization of the Q4, Q5 formats. These were removed with #1405. I think we'd want optimized dot-product routines, and the Q8 quantizations. |
Yes, let's not complicate the implementation with Q4 and Q5 SIMD quantize / dequantize for now. |
I pushed the code to merge AVX2/AVX code. Regarding the adding SIMD optimizations for the quantization of the Q4, Q5 formats, |
I rereopen new PR on latest master.
I added AVX support based on AVX2 code to below functions.
static inline __m256i bytes_from_bits_32(const uint8_t * x)
static inline __m256i bytes_from_nibbles_32(const uint8_t * rsi)
static inline __m256 sum_i16_pairs_float(const __m128i xh, const __m128i xl)
static inline __m256 mul_sum_i8_pairs_float(const __m256i x, const __m256i y)
static void ggml_vec_dot_q4_1_q8_1(const int n, float * restrict s, const void * restrict vx, const void * restrict vy)
static void ggml_vec_dot_q5_0_q8_0(const int n, float * restrict s, const void * restrict vx, const void * restrict vy)
static void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void * restrict vx, const void * restrict vy)
static void ggml_vec_dot_q8_0_q8_0(const int n, float * restrict s, const void * restrict vx, const void * restrict vy)
performance improved:
with q5_0 model: 1084790.97 ms -> 156907.19 ms
with q5_1 model: 1072326.98 ms -> 301364.32 ms
ggerganov and sw, please confirm this PR.