-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Update to torch==2.6.0 #12721
base: main
Are you sure you want to change the base?
Update to torch==2.6.0 #12721
Conversation
Signed-off-by: mgoin <michael@neuralmagic.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, CI looks green
Shall we merge #12393 first? cc: @youkaichao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I built vLLM by merging this PR, and it worked perfectly 🚀
Confirmed that this update will break V1 at the current state, we should wait for #12393 at least
|
@mgoin can you help review and stamp that PR? |
@mgoin Thanks a lot for the update. IPEX CPU w/ PT 2.6 will be released next week. Will update on this as soon as the binary is out. Thanks, -yuan |
This pull request has merge conflicts that must be resolved before it can be |
I wanna when this PR will be merged? |
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Btw, Pytorch updated the auto-functionalization which I think will break our custom fusion passes. So we should disable it, there's an inductor config field called @mgoin do you want me to open a separate PR or can you make the change? We should add this in config.py:3109: if 'enable_auto_functionalized_v2' not in self.inductor_compile_config:
self.inductor_compile_config['enable_auto_functionalized_v2'] = False |
Disabling this feature doesn't fix the weight transpose problem. |
This pull request has merge conflicts that must be resolved before it can be |
If cutlass_scaled_mm is a custom op then it's possible inductor changed the strides to the input for it. Is it possible to get something like the TORCH_LOGS from the run? |
@zou3519 yes and thanks for taking a look! Here is one setting This is the repro:
Is there any way to forbid inductor from changing the strides in this case? I didn't manage to dig in to far but it looks like it's transposing the weight matrix on us. |
Yes, in here change it to ops.def(
"cutlass_scaled_mm(Tensor! out, Tensor a,"
" Tensor b, Tensor a_scales,"
" Tensor b_scales, Tensor? bias) -> ()", {at::Tag::needs_fixed_stride_order}); It turns out PyTorch changed the default behavior for custom operators to be "requires_contiguous" which is not the best, this tag changes the behavior. |
I know what the bug here is (pytorch/pytorch#147924), trying to figure out what the best way to work around it is... EDIT: workaround is disabling auto_functionalized_v2, assuming that doesn't slow down your perf. # I promise we'll fix this asap in PyTorch core, so you can just guard on 2.6
if torch.__version__.startswith("2.6"):
self.inductor_compile_config['enable_auto_functionalized_v2'] = False Are there any other torch.compile related problems I can take a look at? It's unclear to me which of the failing tests are running torch.compile and which aren't |
@zou3519 yeah we actually found the auto-func issue separately as well, and it would break our custom passes so we'll disable it for now. We currently have a workaround for the issue V2 fixes (manual graph fixing to remove the copies). Do we have to guard against the version or can we just disable V2 in all situations? If the config property doesn't exist, will it just be ignored or will it fail? And I guess this additional issue means we should stick to V1 until torch 2.7 anyway. Or is there any way the promised fix gets added to a big fix torch release? |
In terms of tests, once we disable V2 like mentioned here, all tests in |
It's likely the fix will be added to 2.6.1 (the 2.6 patch release).
I'm not sure how vllm's config interacts with torch._inductor.config. But if the config doesn't exist on torch._inductor.config then trying to set it will fail. It depends on if you want the code to also work for PyTorch 2.5 (we introduced this config in PyTorch 2.6, so it'll be around for the foreseeable future) |
Yeah that's what I was wondering, so we should guard for 2.6+ if we want 2.5 to still work (seems worth it in this case). |
Created #14306 for the scaled_mm issue |
FYI, there will be no PyTorch 2.6.1:
https://dev-discuss.pytorch.org/t/no-pytorch-2-6-1-release/2817 |
Hi @mgoin @tlrmchlsmth, what are the remaining blockers for this PR? (other than #14306)? |
@ProExpertProg do we still need to make the change to disable the V2 autofunctionalization? |
Yes, we should. I can post a PR if needed |
Would be great to get this in quickly by tomorrow, so we can make it part of v0.8.0 release |
Yes, you'll need to disable v2 functionalization, I don't know how to workaround this otherwise |
Any fix? |
Only updates for CUDA. Successfully built locally on H100 CUDA 12.5 system and tested with
vllm serve meta-llama/Llama-3.1-8B-Instruct
We should upgrade other hardware backends separately. For instance, CPU is blocked by IPEX in the Dockerfile.cpu
FIX #12719