[torch.compile] rework compile control with piecewise cudagraph #9715

youkaichao · 2024-10-26T05:40:28Z

rework the compilation control.

the user-facing flags are:

Usage	CompilationLevel	how vLLM uses Dynamo	how vLLM uses Inductor	use vLLM's custom ops (*)	how to customize the compilation
`export VLLM_TORCH_COMPILE_LEVEL=0` (default)	NO_COMPILATION (0)	N/A	N/A	✅	N/A
`export VLLM_TORCH_COMPILE_LEVEL=1`	DYNAMO_AS_IS (1)	use as-is	N/A	✅	`vllm.plugins.set_torch_compile_backend` (default to `"eager"`)
`export VLLM_TORCH_COMPILE_LEVEL=2`	DYNAMO_ONCE (2)	use only once, make sure computation graph does not change	N/A	✅	`vllm.plugins.set_torch_compile_backend` (default to `"eager"`)
`export VLLM_TORCH_COMPILE_LEVEL=3`	PIECEWISE (3)	same as 2	compilation behavior determined by the config	❌	write a json config file and specify it with `VLLM_TORCH_COMPILE_CONFIG`, or call `vllm.plugins.set_compilation_config` to set the config directly

* : users can also use VLLM_CUSTOM_OPS env var to have fine-grained control over custom ops.

For the detailed compilation config, please check the code doc.

Signed-off-by: youkaichao <youkaichao@gmail.com>

github-actions · 2024-10-26T05:40:39Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-10-26T05:43:25Z

example code:

import torch
from vllm.compilation.decorators import support_torch_compile
from vllm.compilation.compile_context import set_compile_context
from vllm.plugins import set_attention_ops
set_attention_ops(["silly.attention"])

import torch
from torch import nn

@torch.library.custom_op("silly::attention", mutates_args=["out"])
def silly_attention(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, out: torch.Tensor) -> None:
    print("silly")
    out.copy_(q)
    print(q)
    out[0] += 1

@silly_attention.register_fake
def _(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, out: torch.Tensor) -> None:
    return

@support_torch_compile
class SillyModel(nn.Module):
    def __init__(self) -> None:
        super().__init__()

    def forward(
        self,
        x: torch.Tensor
    ) -> torch.Tensor:
        x = x + 1
        x = x + 2
        out = torch.empty_like(x)
        torch.ops.silly.attention(x, x, x, out)
        x = out
        x = x - 2
        x = x - 1
        out = torch.empty_like(x)
        torch.ops.silly.attention(x, x, x, out)
        x = out
        x = x + 1
        return x

model = SillyModel()

input_buffer = torch.randn(100).cuda()

with set_compile_context([1, 2]):
    model(input_buffer)

    model(input_buffer[:2])
    model(input_buffer[:1])

input_buffer[:2].zero_()
output = model(input_buffer[:2])
print(output.__class__)
print(output[:2])

run with:

VLLM_LOGGING_LEVEL=DEBUG VLLM_TORCH_COMPILE_LEVEL=3 python test.py

requirements:

attention ops will be the boundary of piecewise graphs
if the output of attention ops is used in the subsequent graph, then it needs to be allocated in the previous graph, and passed to attention ops for mutation.

~~# FIXME: it seems pytorch changes the output to a tuple~~
it can be fixed by pytorch/pytorch#138980

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao · 2024-10-29T20:15:04Z

I added some tests based on counters, following pytorch's test principle. @ProExpertProg

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao · 2024-10-30T06:03:42Z

test failure is unrelated and also appear in main branch, merging

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com>

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com>

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao added 8 commits October 25, 2024 19:41

piecewise compile

f376fbf

Signed-off-by: youkaichao <youkaichao@gmail.com>

only log once for the whole graph

7fd5612

Signed-off-by: youkaichao <youkaichao@gmail.com>

add cudagraph

0954dea

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix order

5cc46e8

Signed-off-by: youkaichao <youkaichao@gmail.com>

plugin

d29289e

Signed-off-by: youkaichao <youkaichao@gmail.com>

add logging

4a1a4e3

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix tuple

38d6ad2

Signed-off-by: youkaichao <youkaichao@gmail.com>

revert

63182cd

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao added 20 commits October 25, 2024 23:45

solve copy

e49aa85

Signed-off-by: youkaichao <youkaichao@gmail.com>

skip splitting for the whole graph

94cea2a

Signed-off-by: youkaichao <youkaichao@gmail.com>

rewrite control

e41f046

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix config

23c37c3

Signed-off-by: youkaichao <youkaichao@gmail.com>

add config path

2630ce4

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix json

9dcd517

Signed-off-by: youkaichao <youkaichao@gmail.com>

rename to piecewise

419cb49

Signed-off-by: youkaichao <youkaichao@gmail.com>

rename to non_cudagraph_ops

382ff07

Signed-off-by: youkaichao <youkaichao@gmail.com>

simplify config

255c2fd

Signed-off-by: youkaichao <youkaichao@gmail.com>

add tests

1991582

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix

8acbdb7

Signed-off-by: youkaichao <youkaichao@gmail.com>

add tests

ea27fc1

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix tests

57aafa0

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix pydantic

8e59196

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix bug

66eb9b7

Signed-off-by: youkaichao <youkaichao@gmail.com>

remove asserts

757cc05

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix config

6bd3635

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix inductor call

794d82e

Signed-off-by: youkaichao <youkaichao@gmail.com>

no cudagraph by default

481149e

Signed-off-by: youkaichao <youkaichao@gmail.com>

use config

6303dc5

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao added 5 commits October 29, 2024 12:59

add tests

c283e84

Signed-off-by: youkaichao <youkaichao@gmail.com>

add tests

3fac3ae

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix

bf8afaa

Signed-off-by: youkaichao <youkaichao@gmail.com>

format

0507beb

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix tests

bc08d1a

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao requested a review from ProExpertProg October 29, 2024 20:14

ProExpertProg approved these changes Oct 29, 2024

View reviewed changes

youkaichao added 2 commits October 29, 2024 13:35

fix cudagraph inside

4dd7405

Signed-off-by: youkaichao <youkaichao@gmail.com>

comments

e54bc18

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 29, 2024

youkaichao added 4 commits October 29, 2024 15:43

Merge branch 'main' into piece_wise

7ff5bb2

fix fake inputs

75681ce

Signed-off-by: youkaichao <youkaichao@gmail.com>

disable tests

c78283a

Signed-off-by: youkaichao <youkaichao@gmail.com>

Merge branch 'main' into piece_wise

57b135c

youkaichao merged commit ff5ed6e into vllm-project:main Oct 30, 2024
60 of 68 checks passed

youkaichao deleted the piece_wise branch October 30, 2024 06:03

rasmith pushed a commit to rasmith/vllm that referenced this pull request Oct 30, 2024

[torch.compile] rework compile control with piecewise cudagraph (vllm…

a43527c

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com>

NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Oct 31, 2024

[torch.compile] rework compile control with piecewise cudagraph (vllm…

f160306

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Oct 31, 2024

[torch.compile] rework compile control with piecewise cudagraph (vllm…

5ac7453

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[torch.compile] rework compile control with piecewise cudagraph (vllm…

afdda1b

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[torch.compile] rework compile control with piecewise cudagraph (vllm…

2ff6436

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>

hissu-hyvarinen pushed a commit to ROCm/vllm that referenced this pull request Nov 6, 2024

[torch.compile] rework compile control with piecewise cudagraph (vllm…

0252160

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>

JC1DA pushed a commit to JC1DA/vllm that referenced this pull request Nov 11, 2024

[torch.compile] rework compile control with piecewise cudagraph (vllm…

6453eb9

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[torch.compile] rework compile control with piecewise cudagraph (vllm…

a11a200

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[torch.compile] rework compile control with piecewise cudagraph (vllm…

c4d6ec9

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[torch.compile] rework compile control with piecewise cudagraph (vllm…

48d6737

…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] rework compile control with piecewise cudagraph #9715

[torch.compile] rework compile control with piecewise cudagraph #9715

youkaichao commented Oct 26, 2024 •

edited

Loading

github-actions bot commented Oct 26, 2024

youkaichao commented Oct 26, 2024 •

edited

Loading

youkaichao commented Oct 29, 2024

youkaichao commented Oct 30, 2024

[torch.compile] rework compile control with piecewise cudagraph #9715

[torch.compile] rework compile control with piecewise cudagraph #9715

Conversation

youkaichao commented Oct 26, 2024 • edited Loading

github-actions bot commented Oct 26, 2024

youkaichao commented Oct 26, 2024 • edited Loading

youkaichao commented Oct 29, 2024

youkaichao commented Oct 30, 2024

youkaichao commented Oct 26, 2024 •

edited

Loading

youkaichao commented Oct 26, 2024 •

edited

Loading