-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch #12139
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
2bbc892
to
d622b98
Compare
Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
0952f55
to
81c362d
Compare
This pull request has merge conflicts that must be resolved before it can be |
/ready |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Leave to @youkaichao
Use tuple with double quotes Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
since this pr only changes the hpu code, and add a new env var, I think it's fine from my perspective. |
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
CI not giving stable results, will trigger it again. |
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
feel free to send me your email so I can add you in our Buildkite org. That way you can retry each job instead of retrying the whole build, which is costly. |
Thank you~ @khluu Please add me: andy.yuzhou@gmail.com |
Signed-off-by: zhouyu5 <yu.zhou@intel.com>
I sent an invite |
All test passed, could you help merge it? @comaniac |
…ct#12139) Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
…ct#12139) Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
…ct#12139) Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
…ct#12139) Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: Linkun Chen <github@lkchen.net>
…ct#12139) Signed-off-by: yuzhou <yuzhou@habana.ai> Signed-off-by: zhouyu5 <yu.zhou@intel.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: saeediy <saidakbarp@gmail.com>
Contiguous cache fetching to avoid using costly gather operation on Gaudi3. Requires changes in vllm-hpu-extension (HabanaAI/vllm-hpu-extension#17) to work.
Introduces redundant calculations in decoding phase. Feature improves the performance of all tested workloads over the entire benchmark (5-12%) on Gaudi3. commit further improves the performance of this feature (9-22%). Feature negatively impacts the performance of Gaudi2.
Use VLLM_CONTIGUOUS_PA=true environment variable to enable.