Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support for quantized KV Cache #5

Open
zjnyly opened this issue Jan 8, 2025 · 1 comment
Open

Support for quantized KV Cache #5

zjnyly opened this issue Jan 8, 2025 · 1 comment

Comments

@zjnyly
Copy link

zjnyly commented Jan 8, 2025

Hi, have you tested the performance on quantized KV Cache? Is it possible to keep a resonable high performance under int8 quantization?

@dreaming-panda
Copy link
Contributor

I evaluated INT8 quanto. The accuracy is good (on Llama3.1-8B). But I currently do not have the bandwidth to implement this on the CPU (actually, I do not know how to do this except by calling quanto, which is not very fast on the CPU).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants