Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update to torch==2.6.0 #12721

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Update to torch==2.6.0 #12721

wants to merge 14 commits into from

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented Feb 4, 2025

Only updates for CUDA. Successfully built locally on H100 CUDA 12.5 system and tested with vllm serve meta-llama/Llama-3.1-8B-Instruct

We should upgrade other hardware backends separately. For instance, CPU is blocked by IPEX in the Dockerfile.cpu

FIX #12719

Signed-off-by: mgoin <michael@neuralmagic.com>
@mgoin mgoin requested a review from tlrmchlsmth as a code owner February 4, 2025 01:44
Copy link

github-actions bot commented Feb 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Feb 4, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
@mgoin mgoin changed the title Update to torch==2.6.0 [WIP] Update to torch==2.6.0 Feb 4, 2025
@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 4, 2025
@mgoin mgoin changed the title [WIP] Update to torch==2.6.0 Update to torch==2.6.0 Feb 4, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, CI looks green

@houseroad
Copy link
Contributor

Shall we merge #12393 first? cc: @youkaichao

Copy link
Contributor

@fialhocoelho fialhocoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I built vLLM by merging this PR, and it worked perfectly 🚀

@mgoin
Copy link
Member Author

mgoin commented Feb 4, 2025

Confirmed that this update will break V1 at the current state, we should wait for #12393 at least

VLLM_USE_V1=1 vllm serve meta-llama/Llama-3.1-8B-Instruct
...
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/code/vllm/vllm/compilation/backends.py", line 616, in __call__
ERROR 02-04 15:27:21 core.py:210]     PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile,
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/code/vllm/vllm/compilation/backends.py", line 424, in run
ERROR 02-04 15:27:21 core.py:210]     return super().run(*fake_args)
ERROR 02-04 15:27:21 core.py:210]            ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/venvs/vllm/lib/python3.12/site-packages/torch/fx/interpreter.py", line 167, in run
ERROR 02-04 15:27:21 core.py:210]     self.env[node] = self.run_node(node)
ERROR 02-04 15:27:21 core.py:210]                      ^^^^^^^^^^^^^^^^^^^
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/venvs/vllm/lib/python3.12/site-packages/torch/fx/interpreter.py", line 230, in run_node
ERROR 02-04 15:27:21 core.py:210]     return getattr(self, n.op)(n.target, args, kwargs)
ERROR 02-04 15:27:21 core.py:210]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/code/vllm/vllm/compilation/backends.py", line 439, in call_module
ERROR 02-04 15:27:21 core.py:210]     compiled_graph_for_general_shape = wrap_inductor(
ERROR 02-04 15:27:21 core.py:210]                                        ^^^^^^^^^^^^^^
ERROR 02-04 15:27:21 core.py:210]   File "/home/mgoin/code/vllm/vllm/compilation/backends.py", line 254, in wrap_inductor
ERROR 02-04 15:27:21 core.py:210]     original_load = FxGraphCache.load
ERROR 02-04 15:27:21 core.py:210]                     ^^^^^^^^^^^^^^^^^
ERROR 02-04 15:27:21 core.py:210] torch._dynamo.exc.BackendCompilerFailed: backend='<vllm.compilation.backends.VllmBackend object at 0x71985bc685c0>' raised:
ERROR 02-04 15:27:21 core.py:210] AttributeError: type object 'FxGraphCache' has no attribute 'load'
ERROR 02-04 15:27:21 core.py:210] 
ERROR 02-04 15:27:21 core.py:210] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {})

@youkaichao
Copy link
Member

@mgoin can you help review and stamp that PR?

@zhouyuan
Copy link
Contributor

zhouyuan commented Feb 7, 2025

@mgoin Thanks a lot for the update. IPEX CPU w/ PT 2.6 will be released next week. Will update on this as soon as the binary is out.

Cc: @Guobing-Chen @bigPYJ1151

Thanks, -yuan

Copy link

mergify bot commented Feb 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 10, 2025
@jiangshaoping
Copy link

I wanna when this PR will be merged?

@mergify mergify bot removed the needs-rebase label Feb 10, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
@ProExpertProg
Copy link
Contributor

ProExpertProg commented Feb 24, 2025

Btw, Pytorch updated the auto-functionalization which I think will break our custom fusion passes. So we should disable it, there's an inductor config field called enable_auto_functionalized_v2.

@mgoin do you want me to open a separate PR or can you make the change? We should add this in config.py:3109:

if 'enable_auto_functionalized_v2' not in self.inductor_compile_config:
    self.inductor_compile_config['enable_auto_functionalized_v2'] = False

@bnellnm
Copy link
Contributor

bnellnm commented Feb 24, 2025

Btw, Pytorch updated the auto-functionalization which I think will break our custom fusion passes. So we should disable it, there's an inductor config field called enable_auto_functionalized_v2.

@mgoin do you want me to open a separate PR or can you make the change? We should add this in config.py:3109:

if 'enable_auto_functionalized_v2' not in self.inductor_compile_config:
    self.inductor_compile_config['enable_auto_functionalized_v2'] = False

Disabling this feature doesn't fix the weight transpose problem.

Copy link

mergify bot commented Feb 25, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 25, 2025
@zou3519
Copy link

zou3519 commented Feb 25, 2025

If cutlass_scaled_mm is a custom op then it's possible inductor changed the strides to the input for it. Is it possible to get something like the TORCH_LOGS from the run?

@tlrmchlsmth
Copy link
Collaborator

@zou3519 yes and thanks for taking a look! Here is one setting TORCH_LOGS=+inductor scaled_mm_torch_2.6.log

This is the repro:

from vllm import LLM

llm = LLM(model="nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change", compilation_config=3)

Is there any way to forbid inductor from changing the strides in this case? I didn't manage to dig in to far but it looks like it's transposing the weight matrix on us.

@mergify mergify bot removed the needs-rebase label Feb 25, 2025
@zou3519
Copy link

zou3519 commented Feb 26, 2025

@zou3519 yes and thanks for taking a look! Here is one setting TORCH_LOGS=+inductor scaled_mm_torch_2.6.log

This is the repro:

from vllm import LLM

llm = LLM(model="nm-testing/tinyllama-oneshot-w8w8-test-static-shape-change", compilation_config=3)

Is there any way to forbid inductor from changing the strides in this case? I didn't manage to dig in to far but it looks like it's transposing the weight matrix on us.

Yes, in here change it to

  ops.def(
      "cutlass_scaled_mm(Tensor! out, Tensor a,"
      "                  Tensor b, Tensor a_scales,"
      "                  Tensor b_scales, Tensor? bias) -> ()", {at::Tag::needs_fixed_stride_order});

It turns out PyTorch changed the default behavior for custom operators to be "requires_contiguous" which is not the best, this tag changes the behavior.

@zou3519
Copy link

zou3519 commented Feb 26, 2025

Some kind of failure due to marlin view code https://buildkite.com/vllm/ci/builds/13610#01951565-39bc-4c20-bdd0-e68b61ab1ea1/199-3448

I know what the bug here is (pytorch/pytorch#147924), trying to figure out what the best way to work around it is...

EDIT: workaround is disabling auto_functionalized_v2, assuming that doesn't slow down your perf.

# I promise we'll fix this asap in PyTorch core, so you can just guard on 2.6
if torch.__version__.startswith("2.6"):
    self.inductor_compile_config['enable_auto_functionalized_v2'] = False

Are there any other torch.compile related problems I can take a look at? It's unclear to me which of the failing tests are running torch.compile and which aren't

@ProExpertProg
Copy link
Contributor

@zou3519 yeah we actually found the auto-func issue separately as well, and it would break our custom passes so we'll disable it for now. We currently have a workaround for the issue V2 fixes (manual graph fixing to remove the copies).

Do we have to guard against the version or can we just disable V2 in all situations? If the config property doesn't exist, will it just be ignored or will it fail?

And I guess this additional issue means we should stick to V1 until torch 2.7 anyway. Or is there any way the promised fix gets added to a big fix torch release?

@ProExpertProg
Copy link
Contributor

ProExpertProg commented Feb 26, 2025

In terms of tests, once we disable V2 like mentioned here, all tests in tests/compile/* are torch.compile related.

@zou3519
Copy link

zou3519 commented Feb 26, 2025

And I guess this additional issue means we should stick to V1 until torch 2.7 anyway. Or is there any way the promised fix gets added to a big fix torch release?

It's likely the fix will be added to 2.6.1 (the 2.6 patch release).

Do we have to guard against the version or can we just disable V2 in all situations? If the config property doesn't exist, will it just be ignored or will it fail?

I'm not sure how vllm's config interacts with torch._inductor.config. But if the config doesn't exist on torch._inductor.config then trying to set it will fail. It depends on if you want the code to also work for PyTorch 2.5 (we introduced this config in PyTorch 2.6, so it'll be around for the foreseeable future)

@ProExpertProg
Copy link
Contributor

But if the config doesn't exist on torch._inductor.config then trying to set it will fail.

Yeah that's what I was wondering, so we should guard for 2.6+ if we want 2.5 to still work (seems worth it in this case).

@tlrmchlsmth
Copy link
Collaborator

Created #14306 for the scaled_mm issue

@tiran
Copy link

tiran commented Mar 6, 2025

It's likely the fix will be added to 2.6.1 (the 2.6 patch release).

FYI, there will be no PyTorch 2.6.1:

This is to confirm that there will be no PyTorch 2.6.1 release and the next release will be of PyTorch 2.7 with release day on 4/23. We will be moving all outstanding items from the 2.6.1 milestone to the 2.7.0 milestone.

https://dev-discuss.pytorch.org/t/no-pytorch-2-6-1-release/2817

@simon-mo
Copy link
Collaborator

simon-mo commented Mar 6, 2025

Hi @mgoin @tlrmchlsmth, what are the remaining blockers for this PR? (other than #14306)?

@tlrmchlsmth
Copy link
Collaborator

@ProExpertProg do we still need to make the change to disable the V2 autofunctionalization?

@ProExpertProg
Copy link
Contributor

Yes, we should. I can post a PR if needed

Copy link
Collaborator

simon-mo commented Mar 6, 2025

Would be great to get this in quickly by tomorrow, so we can make it part of v0.8.0 release

@zou3519
Copy link

zou3519 commented Mar 6, 2025

Yes, you'll need to disable v2 functionalization, I don't know how to workaround this otherwise

@SinanTokmak
Copy link

Any fix?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Installation]: Supporting PyTorch 2.6?