Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch #12139

Merged
merged 28 commits into from
Feb 19, 2025

Conversation

zhouyu5
Copy link
Contributor

@zhouyu5 zhouyu5 commented Jan 17, 2025

Contiguous cache fetching to avoid using costly gather operation on Gaudi3. Requires changes in vllm-hpu-extension (HabanaAI/vllm-hpu-extension#17) to work.

Introduces redundant calculations in decoding phase. Feature improves the performance of all tested workloads over the entire benchmark (5-12%) on Gaudi3. commit further improves the performance of this feature (9-22%). Feature negatively impacts the performance of Gaudi2.

Use VLLM_CONTIGUOUS_PA=true environment variable to enable.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@zhouyu5 zhouyu5 changed the title [Hardware][Gaudi] Support Contiguous PA [Hardware][Gaudi][Feature] Support Contiguous PA Jan 17, 2025
@zhouyu5 zhouyu5 force-pushed the hpu_cont_pa branch 2 times, most recently from 2bbc892 to d622b98 Compare January 17, 2025 09:10
zhouyu5 and others added 9 commits January 17, 2025 11:24
Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
@zhouyu5 zhouyu5 force-pushed the hpu_cont_pa branch 2 times, most recently from 0952f55 to 81c362d Compare January 23, 2025 08:50
Copy link

mergify bot commented Jan 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhouyu5.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 23, 2025
@mergify mergify bot removed the needs-rebase label Jan 23, 2025
@zhouyu5 zhouyu5 marked this pull request as ready for review January 23, 2025 08:59
@zhouyu5
Copy link
Contributor Author

zhouyu5 commented Jan 23, 2025

/ready

Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Copy link

mergify bot commented Feb 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhouyu5.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 8, 2025
@mergify mergify bot removed the needs-rebase label Feb 13, 2025
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
@zhouyu5 zhouyu5 changed the title [Hardware][Gaudi][Feature] Support Contiguous PA [Hardware][Gaudi][Feature] Support Contiguous Cache Fetch Feb 13, 2025
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
@comaniac comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 17, 2025
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Leave to @youkaichao

Use tuple with double quotes

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
@youkaichao
Copy link
Member

LGTM. Leave to @youkaichao

since this pr only changes the hpu code, and add a new env var, I think it's fine from my perspective.

Signed-off-by: zhouyu5 <yu.zhou@intel.com>
@zhouyu5
Copy link
Contributor Author

zhouyu5 commented Feb 18, 2025

CI not giving stable results, will trigger it again.

Signed-off-by: zhouyu5 <yu.zhou@intel.com>
@khluu
Copy link
Collaborator

khluu commented Feb 18, 2025

CI not giving stable results, will trigger it again.

feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly.
If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.

@zhouyu5
Copy link
Contributor Author

zhouyu5 commented Feb 18, 2025

feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.

Thank you~ @khluu Please add me: andy.yuzhou@gmail.com

Signed-off-by: zhouyu5 <yu.zhou@intel.com>
@khluu
Copy link
Collaborator

khluu commented Feb 19, 2025

feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. If the CI result is still not stable after retrying, please just comment here and your PR reviewers can decide whether it should be force-merged.

Thank you~ @khluu Please add me: andy.yuzhou@gmail.com

I sent an invite

@zhouyu5
Copy link
Contributor Author

zhouyu5 commented Feb 19, 2025

All test passed, could you help merge it? @comaniac

@khluu khluu merged commit d0a7a27 into vllm-project:main Feb 19, 2025
46 checks passed
xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2025
…ct#12139)

Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
kerthcet pushed a commit to kerthcet/vllm that referenced this pull request Feb 21, 2025
…ct#12139)

Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025
…ct#12139)

Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Mar 5, 2025
…ct#12139)

Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Signed-off-by: Linkun Chen <github@lkchen.net>
Said-Akbar pushed a commit to Said-Akbar/vllm-rocm that referenced this pull request Mar 7, 2025
…ct#12139)

Signed-off-by: yuzhou <yuzhou@habana.ai>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Signed-off-by: saeediy <saidakbarp@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants