Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 #4996

Merged
merged 1 commit into from
Jan 17, 2024
Merged

Conversation

ikawrakow
Copy link
Contributor

I have missed this tweak when adding Q2_K_S.

With this change, model size for Mistral-7B increases by only ~30 MB (0.03 bpw) while

  • Perplexity for a context of 512 on wiki.test.raw goes down from 6.9259 to 6.7116
  • 10-shot HellaSwag score after 2000 tasks increases by 0.95 +/- 0.42.

@ggerganov ggerganov merged commit 2b3a665 into master Jan 17, 2024
41 of 46 checks passed
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants