Skip to content

Retire the ggml_mul_mat() branch for transposed src0 #500

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 2 commits into from
Mar 25, 2023

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Mar 25, 2023

  • It can always be made contiguous with ggml_cpy()
  • The code is now simplified
  • The results are deterministic in respect to num threads
  • Also added ARM_NEOM implementation of dequantize_row_q4_0()

- It can always be made contiguous with ggml_cpy()
- The code is now simplified
- The results are deterministic in respect to num threads
* Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON

* Fix dequantization - forgot to interleave the quants
@ggerganov ggerganov merged commit ecbe466 into master Mar 25, 2023
@ggerganov ggerganov deleted the simple-mul_mat branch March 25, 2023 17:47
sw added a commit to sw/llama.cpp that referenced this pull request Apr 4, 2023
sw added a commit to sw/llama.cpp that referenced this pull request Apr 4, 2023
sw added a commit to sw/llama.cpp that referenced this pull request Apr 4, 2023
sw added a commit to sw/llama.cpp that referenced this pull request Apr 4, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant