Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[CORE]Add Attention Quantization #73

Merged
merged 8 commits into from
Feb 17, 2025

Conversation

Angazenn
Copy link
Contributor

This PR add attention quantization interfaces, including AscendQKVQuantAttentionMethod class inherited from BaseKVCacheMethod class.

angazenn added 6 commits February 11, 2025 15:38
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
@Angazenn Angazenn changed the title Attention quant [CORE]Attention quant Feb 17, 2025
@Angazenn Angazenn changed the title [CORE]Attention quant [CORE]Add Attention Quantization Feb 17, 2025
angazenn added 2 commits February 17, 2025 19:24
@wangxiyuan wangxiyuan merged commit 36d1349 into vllm-project:v0.7.1-dev Feb 17, 2025
3 checks passed
Angazenn added a commit to Angazenn/vllm-ascend that referenced this pull request Feb 21, 2025
This PR add attention quantization interfaces, including
AscendQKVQuantAttentionMethod class inherited from BaseKVCacheMethod
class.

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants