[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch #12139

zhouyu5 · 2025-01-17T03:19:50Z

Contiguous cache fetching to avoid using costly gather operation on Gaudi3. Requires changes in vllm-hpu-extension (HabanaAI/vllm-hpu-extension#17) to work.

Introduces redundant calculations in decoding phase. Feature improves the performance of all tested workloads over the entire benchmark (5-12%) on Gaudi3. commit further improves the performance of this feature (9-22%). Feature negatively impacts the performance of Gaudi2.

Use VLLM_CONTIGUOUS_PA=true environment variable to enable.

github-actions · 2025-01-17T03:20:00Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

mergify · 2025-01-23T08:51:09Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhouyu5.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

zhouyu5 · 2025-01-23T09:00:03Z

/ready

vllm/core/block/naive_block.py

vllm/worker/hpu_model_runner.py

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

mergify · 2025-02-08T07:52:51Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhouyu5.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

comaniac

LGTM. Leave to @youkaichao

vllm/envs.py

Use tuple with double quotes Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: zhouyu5 <yu.zhou@intel.com>

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

youkaichao · 2025-02-18T03:24:03Z

LGTM. Leave to @youkaichao

since this pr only changes the hpu code, and add a new env var, I think it's fine from my perspective.

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

zhouyu5 · 2025-02-18T08:12:51Z

CI not giving stable results, will trigger it again.

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

khluu · 2025-02-18T08:20:48Z

CI not giving stable results, will trigger it again.

feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly.
If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.

zhouyu5 · 2025-02-18T09:50:04Z

feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.

Thank you~ @khluu Please add me: andy.yuzhou@gmail.com

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

khluu · 2025-02-19T00:39:52Z

feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.

Thank you~ @khluu Please add me: andy.yuzhou@gmail.com

I sent an invite

zhouyu5 · 2025-02-19T02:29:06Z

All test passed, could you help merge it? @comaniac

…ct#12139) Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>

…ct#12139) Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: Linkun Chen <github@lkchen.net>

…ct#12139) Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: saeediy <saidakbarp@gmail.com>

zhouyu5 changed the title ~~[Hardware][Gaudi] Support Contiguous PA~~ [Hardware][Gaudi][Feature] Support Contiguous PA Jan 17, 2025

zhouyu5 force-pushed the hpu_cont_pa branch 2 times, most recently from 2bbc892 to d622b98 Compare January 17, 2025 09:10

zhouyu5 and others added 9 commits January 17, 2025 11:24

add block groups

ed66a60

Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>

add env

4ea1601

Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>

make cont pa functional

98047e1

Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>

bug fix

eede3ce

Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>

Reduce block fragmentation

7ea419d

Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>

fix type

ba1020d

Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>

fix format

62fd8ed

Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>

fix format

bb0a1cf

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

fix format

81c362d

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

zhouyu5 force-pushed the hpu_cont_pa branch 2 times, most recently from 0952f55 to 81c362d Compare January 23, 2025 08:50

mergify bot added the needs-rebase label Jan 23, 2025

Merge branch 'main' into hpu_cont_pa

467e124

mergify bot removed the needs-rebase label Jan 23, 2025

zhouyu5 marked this pull request as ready for review January 23, 2025 08:59

zhouyu5 requested review from zhuohan123, youkaichao, alexm-redhat, comaniac and njhill as code owners January 23, 2025 08:59

jikunshang reviewed Jan 24, 2025

View reviewed changes

vllm/core/block/naive_block.py Outdated Show resolved Hide resolved

vllm/worker/hpu_model_runner.py Outdated Show resolved Hide resolved

minor fix

0987c2f

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

mergify bot added the needs-rebase label Feb 8, 2025

mergify bot removed the needs-rebase label Feb 13, 2025

fix format

edf207e

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

zhouyu5 changed the title ~~[Hardware][Gaudi][Feature] Support Contiguous PA~~ [Hardware][Gaudi][Feature] Support Contiguous Cache Fetch Feb 13, 2025

zhouyu5 added 4 commits February 13, 2025 07:43

fix format

f755ce8

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

fix format

1965878

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

Merge remote-tracking branch 'upstream/main' into hpu_cont_pa

86f6e85

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

revert change for _free_block_indices

1b74f81

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

comaniac approved these changes Feb 17, 2025

View reviewed changes

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 17, 2025

Merge remote-tracking branch 'upstream/main' into hpu_cont_pa

b4d9040

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

comaniac approved these changes Feb 18, 2025

View reviewed changes

vllm/envs.py Outdated Show resolved Hide resolved

Update vllm/envs.py

912d08b

Use tuple with double quotes Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: zhouyu5 <yu.zhou@intel.com>

zhouyu5 force-pushed the hpu_cont_pa branch from 69766d0 to 912d08b Compare February 18, 2025 02:05

zhouyu5 added 2 commits February 18, 2025 04:15

fix format

91ca7d9

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

fix format

097b34e

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

Dummy commit for triggering CI

b0fdbaa

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

Dummy commit for triggering CI

2678b2c

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

Dummy commit for triggering CI

23dc975

Signed-off-by: zhouyu5 <yu.zhou@intel.com>

khluu merged commit d0a7a27 into vllm-project:main Feb 19, 2025
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch #12139

[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch #12139

zhouyu5 commented Jan 17, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 17, 2025

mergify bot commented Jan 23, 2025

zhouyu5 commented Jan 23, 2025

mergify bot commented Feb 8, 2025

comaniac left a comment

youkaichao commented Feb 18, 2025

zhouyu5 commented Feb 18, 2025

khluu commented Feb 18, 2025 •

edited

Loading

zhouyu5 commented Feb 18, 2025 •

edited

Loading

khluu commented Feb 19, 2025

zhouyu5 commented Feb 19, 2025

[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch #12139

[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch #12139

Conversation

zhouyu5 commented Jan 17, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 17, 2025

mergify bot commented Jan 23, 2025

zhouyu5 commented Jan 23, 2025

mergify bot commented Feb 8, 2025

comaniac left a comment

Choose a reason for hiding this comment

youkaichao commented Feb 18, 2025

zhouyu5 commented Feb 18, 2025

khluu commented Feb 18, 2025 • edited Loading

zhouyu5 commented Feb 18, 2025 • edited Loading

khluu commented Feb 19, 2025

zhouyu5 commented Feb 19, 2025

zhouyu5 commented Jan 17, 2025 •

edited by github-actions bot

Loading

khluu commented Feb 18, 2025 •

edited

Loading

zhouyu5 commented Feb 18, 2025 •

edited

Loading