-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[torch.compile] rework compile control with piecewise cudagraph #9715
Conversation
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
example code: import torch
from vllm.compilation.decorators import support_torch_compile
from vllm.compilation.compile_context import set_compile_context
from vllm.plugins import set_attention_ops
set_attention_ops(["silly.attention"])
import torch
from torch import nn
@torch.library.custom_op("silly::attention", mutates_args=["out"])
def silly_attention(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, out: torch.Tensor) -> None:
print("silly")
out.copy_(q)
print(q)
out[0] += 1
@silly_attention.register_fake
def _(q: torch.Tensor, k: torch.Tensor, v: torch.Tensor, out: torch.Tensor) -> None:
return
@support_torch_compile
class SillyModel(nn.Module):
def __init__(self) -> None:
super().__init__()
def forward(
self,
x: torch.Tensor
) -> torch.Tensor:
x = x + 1
x = x + 2
out = torch.empty_like(x)
torch.ops.silly.attention(x, x, x, out)
x = out
x = x - 2
x = x - 1
out = torch.empty_like(x)
torch.ops.silly.attention(x, x, x, out)
x = out
x = x + 1
return x
model = SillyModel()
input_buffer = torch.randn(100).cuda()
with set_compile_context([1, 2]):
model(input_buffer)
model(input_buffer[:2])
model(input_buffer[:1])
input_buffer[:2].zero_()
output = model(input_buffer[:2])
print(output.__class__)
print(output[:2]) run with:
requirements:
|
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
I added some tests based on counters, following pytorch's test principle. @ProExpertProg |
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
test failure is unrelated and also appear in main branch, merging |
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Linkun Chen <github+anyscale@lkchen.net>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
…-project#9715) Signed-off-by: youkaichao <youkaichao@gmail.com>
rework the compilation control.
the user-facing flags are:
export VLLM_TORCH_COMPILE_LEVEL=0
(default)export VLLM_TORCH_COMPILE_LEVEL=1
vllm.plugins.set_torch_compile_backend
(default to"eager"
)export VLLM_TORCH_COMPILE_LEVEL=2
vllm.plugins.set_torch_compile_backend
(default to"eager"
)export VLLM_TORCH_COMPILE_LEVEL=3
VLLM_TORCH_COMPILE_CONFIG
, or callvllm.plugins.set_compilation_config
to set the config directly*
: users can also useVLLM_CUSTOM_OPS
env var to have fine-grained control over custom ops.For the detailed compilation config, please check the code doc.