[CORE]Add Attention Quantization #73

Angazenn · 2025-02-17T09:01:42Z

This PR add attention quantization interfaces, including AscendQKVQuantAttentionMethod class inherited from BaseKVCacheMethod class.

Signed-off-by: angazenn <zengyanjia@huawei.com>

This PR add attention quantization interfaces, including AscendQKVQuantAttentionMethod class inherited from BaseKVCacheMethod class. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Signed-off-by: angazenn <zengyanjia@huawei.com>

angazenn added 6 commits February 11, 2025 15:38

adapt to new config.json

3b71e0a

Signed-off-by: angazenn <zengyanjia@huawei.com>

clean code

4ec1371

Signed-off-by: angazenn <zengyanjia@huawei.com>

narrow down the exception range

ce59965

Signed-off-by: angazenn <zengyanjia@huawei.com>

disable warning

935060e

Signed-off-by: angazenn <zengyanjia@huawei.com>

add ascend attention quantization

c7a58a9

Signed-off-by: angazenn <zengyanjia@huawei.com>

add quant branch when forwarding attention

c390c58

Signed-off-by: angazenn <zengyanjia@huawei.com>

Angazenn changed the title ~~Attention quant~~ [CORE]Attention quant Feb 17, 2025

Angazenn changed the title ~~[CORE]Attention quant~~ [CORE]Add Attention Quantization Feb 17, 2025

angazenn added 2 commits February 17, 2025 19:24

Merge remote-tracking branch 'upstream/v0.7.1-dev' into attention_quant

15a44eb

fix import error

6eb0f91

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan approved these changes Feb 17, 2025

View reviewed changes

ganyi1996ppo approved these changes Feb 17, 2025

View reviewed changes

wangxiyuan merged commit 36d1349 into vllm-project:v0.7.1-dev Feb 17, 2025
3 checks passed

Yikun mentioned this pull request Feb 18, 2025

[Doc] Update doc to work with release #85

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE]Add Attention Quantization #73

[CORE]Add Attention Quantization #73

Angazenn commented Feb 17, 2025

[CORE]Add Attention Quantization #73

[CORE]Add Attention Quantization #73

Conversation

Angazenn commented Feb 17, 2025