Update to torch==2.6.0 #12721

mgoin · 2025-02-04T01:44:29Z

Only updates for CUDA. Successfully built locally on H100 CUDA 12.5 system and tested with vllm serve meta-llama/Llama-3.1-8B-Instruct

We should upgrade other hardware backends separately. For instance, CPU is blocked by IPEX in the Dockerfile.cpu

FIX #12719

Signed-off-by: mgoin <michael@neuralmagic.com>

github-actions · 2025-02-04T01:44:39Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: mgoin <michael@neuralmagic.com>

tlrmchlsmth

Nice, CI looks green

houseroad · 2025-02-04T05:29:32Z

Shall we merge #12393 first? cc: @youkaichao

fialhocoelho

LGTM. I built vLLM by merging this PR, and it worked perfectly 🚀

mgoin · 2025-02-04T15:28:57Z

Confirmed that this update will break V1 at the current state, we should wait for #12393 at least

VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.1-8B-Instruct
...
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/code/vllm/vllm/compilation/backends.py", line 616, in __call__
ERROR 02-04 15:27:21 core.py:210]     PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile,
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/code/vllm/vllm/compilation/backends.py", line 424, in run
ERROR 02-04 15:27:21 core.py:210]     return super().run(*fake_args)
ERROR 02-04 15:27:21 core.py:210]            ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/venvs/vllm/lib/python3.12/site-packages/torch/fx/interpreter.py", line 167, in run
ERROR 02-04 15:27:21 core.py:210]     self.env[node] = self.run_node(node)
ERROR 02-04 15:27:21 core.py:210]                      ^^^^^^^^^^^^^^^^^^^
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/venvs/vllm/lib/python3.12/site-packages/torch/fx/interpreter.py", line 230, in run_node
ERROR 02-04 15:27:21 core.py:210]     return getattr(self, n.op)(n.target, args, kwargs)
ERROR 02-04 15:27:21 core.py:210]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/code/vllm/vllm/compilation/backends.py", line 439, in call_module
ERROR 02-04 15:27:21 core.py:210]     compiled_graph_for_general_shape = wrap_inductor(
ERROR 02-04 15:27:21 core.py:210]                                        ^^^^^^^^^^^^^^
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/code/vllm/vllm/compilation/backends.py", line 254, in wrap_inductor
ERROR 02-04 15:27:21 core.py:210]     original_load = FxGraphCache.load
ERROR 02-04 15:27:21 core.py:210]                     ^^^^^^^^^^^^^^^^^
ERROR 02-04 15:27:21 core.py:210] torch._dynamo.exc.BackendCompilerFailed: backend='<vllm.compilation.backends.VllmBackend object at 0x71985bc685c0>' raised:
ERROR 02-04 15:27:21 core.py:210] AttributeError: type object 'FxGraphCache' has no attribute 'load'
ERROR 02-04 15:27:21 core.py:210] 
ERROR 02-04 15:27:21 core.py:210] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {})

youkaichao · 2025-02-04T16:48:42Z

@mgoin can you help review and stamp that PR?

zhouyuan · 2025-02-07T06:23:48Z

@mgoin Thanks a lot for the update. IPEX CPU w/ PT 2.6 will be released next week. Will update on this as soon as the binary is out.

Cc: @Guobing-Chen @bigPYJ1151

Thanks, -yuan

mergify · 2025-02-10T08:30:37Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

jiangshaoping · 2025-02-10T14:49:35Z

I wanna when this PR will be merged？

Signed-off-by: mgoin <mgoin64@gmail.com>

ProExpertProg · 2025-02-24T19:28:52Z

Btw, Pytorch updated the auto-functionalization which I think will break our custom fusion passes. So we should disable it, there's an inductor config field called enable_auto_functionalized_v2.

@mgoin do you want me to open a separate PR or can you make the change? We should add this in config.py:3109:

if 'enable_auto_functionalized_v2' not in self.inductor_compile_config:
    self.inductor_compile_config['enable_auto_functionalized_v2'] = False

bnellnm · 2025-02-24T20:13:29Z

Btw, Pytorch updated the auto-functionalization which I think will break our custom fusion passes. So we should disable it, there's an inductor config field called enable_auto_functionalized_v2.

@mgoin do you want me to open a separate PR or can you make the change? We should add this in config.py:3109:
if 'enable_auto_functionalized_v2' not in self.inductor_compile_config:
    self.inductor_compile_config['enable_auto_functionalized_v2'] = False

Disabling this feature doesn't fix the weight transpose problem.

mergify · 2025-02-25T08:20:41Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

zou3519 · 2025-02-25T19:13:55Z

If cutlass_scaled_mm is a custom op then it's possible inductor changed the strides to the input for it. Is it possible to get something like the TORCH_LOGS from the run?

tlrmchlsmth · 2025-02-25T20:47:53Z

@zou3519 yes and thanks for taking a look! Here is one setting TORCH_LOGS=+inductor scaled_mm_torch_2.6.log

This is the repro:

from vllm import LLM

llm = LLM(model="nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change", compilation_config=3)

Is there any way to forbid inductor from changing the strides in this case? I didn't manage to dig in to far but it looks like it's transposing the weight matrix on us.

requirements-build.txt

zou3519 · 2025-02-26T02:08:51Z

@zou3519 yes and thanks for taking a look! Here is one setting TORCH_LOGS=+inductor scaled_mm_torch_2.6.log

This is the repro:
from vllm import LLM

llm = LLM(model="nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change", compilation_config=3)
Is there any way to forbid inductor from changing the strides in this case? I didn't manage to dig in to far but it looks like it's transposing the weight matrix on us.

Yes, in here change it to

  ops.def(
      "cutlass_scaled_mm(Tensor! out, Tensor a,"
      "                  Tensor b, Tensor a_scales,"
      "                  Tensor b_scales, Tensor? bias) -> ()", {at::Tag::needs_fixed_stride_order});

It turns out PyTorch changed the default behavior for custom operators to be "requires_contiguous" which is not the best, this tag changes the behavior.

zou3519 · 2025-02-26T03:40:02Z

Some kind of failure due to marlin view code https://buildkite.com/vllm/ci/builds/13610#01951565-39bc-4c20-bdd0-e68b61ab1ea1/199-3448

I know what the bug here is (pytorch/pytorch#147924), trying to figure out what the best way to work around it is...

EDIT: workaround is disabling auto_functionalized_v2, assuming that doesn't slow down your perf.

# I promise we'll fix this asap in PyTorch core, so you can just guard on 2.6
if torch.__version__.startswith("2.6"):
    self.inductor_compile_config['enable_auto_functionalized_v2'] = False

Are there any other torch.compile related problems I can take a look at? It's unclear to me which of the failing tests are running torch.compile and which aren't

ProExpertProg · 2025-02-26T03:57:24Z

@zou3519 yeah we actually found the auto-func issue separately as well, and it would break our custom passes so we'll disable it for now. We currently have a workaround for the issue V2 fixes (manual graph fixing to remove the copies).

Do we have to guard against the version or can we just disable V2 in all situations? If the config property doesn't exist, will it just be ignored or will it fail?

And I guess this additional issue means we should stick to V1 until torch 2.7 anyway. Or is there any way the promised fix gets added to a big fix torch release?

ProExpertProg · 2025-02-26T04:06:40Z

In terms of tests, once we disable V2 like mentioned here, all tests in tests/compile/* are torch.compile related.

zou3519 · 2025-02-26T04:08:05Z

And I guess this additional issue means we should stick to V1 until torch 2.7 anyway. Or is there any way the promised fix gets added to a big fix torch release?

It's likely the fix will be added to 2.6.1 (the 2.6 patch release).

Do we have to guard against the version or can we just disable V2 in all situations? If the config property doesn't exist, will it just be ignored or will it fail?

I'm not sure how vllm's config interacts with torch._inductor.config. But if the config doesn't exist on torch._inductor.config then trying to set it will fail. It depends on if you want the code to also work for PyTorch 2.5 (we introduced this config in PyTorch 2.6, so it'll be around for the foreseeable future)

ProExpertProg · 2025-02-26T04:30:14Z

But if the config doesn't exist on torch._inductor.config then trying to set it will fail.

Yeah that's what I was wondering, so we should guard for 2.6+ if we want 2.5 to still work (seems worth it in this case).

tlrmchlsmth · 2025-03-05T19:57:26Z

Created #14306 for the scaled_mm issue

tiran · 2025-03-06T10:34:06Z

It's likely the fix will be added to 2.6.1 (the 2.6 patch release).

FYI, there will be no PyTorch 2.6.1:

This is to confirm that there will be no PyTorch 2.6.1 release and the next release will be of PyTorch 2.7 with release day on 4/23. We will be moving all outstanding items from the 2.6.1 milestone to the 2.7.0 milestone.

https://dev-discuss.pytorch.org/t/no-pytorch-2-6-1-release/2817

simon-mo · 2025-03-06T16:40:20Z

Hi @mgoin @tlrmchlsmth, what are the remaining blockers for this PR? (other than #14306)?

tlrmchlsmth · 2025-03-06T17:34:29Z

@ProExpertProg do we still need to make the change to disable the V2 autofunctionalization?

ProExpertProg · 2025-03-06T17:35:27Z

Yes, we should. I can post a PR if needed

simon-mo · 2025-03-06T17:39:34Z

Would be great to get this in quickly by tomorrow, so we can make it part of v0.8.0 release

zou3519 · 2025-03-06T18:15:03Z

Yes, you'll need to disable v2 functionalization, I don't know how to workaround this otherwise

SinanTokmak · 2025-03-09T03:09:37Z

Any fix?

Update to torch==2.6.0

bdf48f0

Signed-off-by: mgoin <michael@neuralmagic.com>

mgoin requested a review from tlrmchlsmth as a code owner February 4, 2025 01:44

mergify bot added the ci/build label Feb 4, 2025

Update xformers

84c62c3

Signed-off-by: mgoin <michael@neuralmagic.com>

zhuohan123 mentioned this pull request Feb 4, 2025

[Installation]: Supporting PyTorch 2.6? #12719

Open

1 task

mgoin changed the title ~~Update to torch==2.6.0~~ [WIP] Update to torch==2.6.0 Feb 4, 2025

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 4, 2025

mgoin changed the title ~~[WIP] Update to torch==2.6.0~~ Update to torch==2.6.0 Feb 4, 2025

Revert cpu

1f1814b

Signed-off-by: mgoin <michael@neuralmagic.com>

zhuohan123 approved these changes Feb 4, 2025

View reviewed changes

tlrmchlsmth approved these changes Feb 4, 2025

View reviewed changes

jeejeelee approved these changes Feb 4, 2025

View reviewed changes

fialhocoelho reviewed Feb 4, 2025

View reviewed changes

mgoin mentioned this pull request Feb 5, 2025

[torch.compile] PyTorch 2.6 and nightly compatibility #12393

Merged

Merge branch 'main' into update-torch-2.6.0

2e39723

mergify bot added the needs-rebase label Feb 10, 2025

Merge branch 'main' into update-torch-2.6.0

cad5a1a

mergify bot removed the needs-rebase label Feb 10, 2025

Update triton test dep

dc3d473

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin mentioned this pull request Feb 12, 2025

[CI/Build] Add support for Python 3.13 #13164

Draft

4 tasks

mgoin and others added 3 commits February 12, 2025 16:59

Fix requirements-test

adf40e4

Signed-off-by: mgoin <mgoin64@gmail.com>

Merge branch 'main' into update-torch-2.6.0

48738f8

Merge branch 'main' into update-torch-2.6.0

7a17ba3

jeejeelee mentioned this pull request Feb 25, 2025

[Frontend] Remove custom_cache_manager #13791

Open

mergify bot added the needs-rebase label Feb 25, 2025

Merge branch 'main' into update-torch-2.6.0

43f4554

mergify bot removed the needs-rebase label Feb 25, 2025

jamesbraza reviewed Feb 25, 2025

View reviewed changes

requirements-build.txt Show resolved Hide resolved

This was referenced Feb 27, 2025

[Bug]: running deepseek-r1 14B with 2*5090D #13914

Open

[Bug]: vllm crash when enable prefix caching #13954

Open

hmellor mentioned this pull request Feb 28, 2025

[Bug]: Could not run '_C::rms_norm' with arguments from the 'CUDA' backend. #12441

Closed

1 task

zou3519 mentioned this pull request Mar 5, 2025

dataclasses.replace not supported by dynamo pytorch/pytorch#136481

Closed

Merge branch 'main' into update-torch-2.6.0

dca1598

tlrmchlsmth mentioned this pull request Mar 5, 2025

[Kernel] Add needs_fixed_stride_order tag to most GEMMs #14306

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to torch==2.6.0 #12721

Update to torch==2.6.0 #12721

mgoin commented Feb 4, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 4, 2025

tlrmchlsmth left a comment

houseroad commented Feb 4, 2025

fialhocoelho left a comment

mgoin commented Feb 4, 2025

youkaichao commented Feb 4, 2025

zhouyuan commented Feb 7, 2025

mergify bot commented Feb 10, 2025

jiangshaoping commented Feb 10, 2025

ProExpertProg commented Feb 24, 2025 •

edited

Loading

bnellnm commented Feb 24, 2025

mergify bot commented Feb 25, 2025

zou3519 commented Feb 25, 2025

tlrmchlsmth commented Feb 25, 2025

zou3519 commented Feb 26, 2025 •

edited

Loading

zou3519 commented Feb 26, 2025 •

edited

Loading

ProExpertProg commented Feb 26, 2025

ProExpertProg commented Feb 26, 2025 •

edited

Loading

zou3519 commented Feb 26, 2025

ProExpertProg commented Feb 26, 2025

tlrmchlsmth commented Mar 5, 2025

tiran commented Mar 6, 2025

simon-mo commented Mar 6, 2025

tlrmchlsmth commented Mar 6, 2025

ProExpertProg commented Mar 6, 2025

simon-mo commented Mar 6, 2025

zou3519 commented Mar 6, 2025

SinanTokmak commented Mar 9, 2025

Update to torch==2.6.0 #12721

Are you sure you want to change the base?

Update to torch==2.6.0 #12721

Conversation

mgoin commented Feb 4, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 4, 2025

tlrmchlsmth left a comment

Choose a reason for hiding this comment

houseroad commented Feb 4, 2025

fialhocoelho left a comment

Choose a reason for hiding this comment

mgoin commented Feb 4, 2025

youkaichao commented Feb 4, 2025

zhouyuan commented Feb 7, 2025

mergify bot commented Feb 10, 2025

jiangshaoping commented Feb 10, 2025

ProExpertProg commented Feb 24, 2025 • edited Loading

bnellnm commented Feb 24, 2025

mergify bot commented Feb 25, 2025

zou3519 commented Feb 25, 2025

tlrmchlsmth commented Feb 25, 2025

zou3519 commented Feb 26, 2025 • edited Loading

zou3519 commented Feb 26, 2025 • edited Loading

ProExpertProg commented Feb 26, 2025

ProExpertProg commented Feb 26, 2025 • edited Loading

zou3519 commented Feb 26, 2025

ProExpertProg commented Feb 26, 2025

tlrmchlsmth commented Mar 5, 2025

tiran commented Mar 6, 2025

simon-mo commented Mar 6, 2025

tlrmchlsmth commented Mar 6, 2025

ProExpertProg commented Mar 6, 2025

simon-mo commented Mar 6, 2025

zou3519 commented Mar 6, 2025

SinanTokmak commented Mar 9, 2025

mgoin commented Feb 4, 2025 •

edited by github-actions bot

Loading

ProExpertProg commented Feb 24, 2025 •

edited

Loading

zou3519 commented Feb 26, 2025 •

edited

Loading

zou3519 commented Feb 26, 2025 •

edited

Loading

ProExpertProg commented Feb 26, 2025 •

edited

Loading