[issue tracker] make quantization compatible with dynamo dynamic shape #9234

youkaichao · 2024-10-10T08:00:04Z

Anything you want to discuss about vllm.

here is a simple demo code:

import torch
from torch.utils.cpp_extension import load_inline

custom_library = torch.library.Library("custom", "DEF")
custom_library.define("add_cpp(Tensor x, int y) -> Tensor")

cpp_source = """
#include <torch/extension.h>

torch::Tensor custom_add(torch::Tensor x, int64_t y) {
    return x + y;
}

TORCH_LIBRARY_IMPL(custom, CPU, m) {
    m.impl("add_cpp", custom_add);
}
"""

custom_op = load_inline(
    name="custom_op",
    cpp_sources=cpp_source,
    extra_cflags=[],
    functions=["custom_add"]
)

@torch.library.register_fake("custom::add_cpp")
def _(x: torch.Tensor, y: int) -> torch.Tensor:
    return torch.empty((y,), dtype=torch.float32)

import torch

@torch.library.custom_op("custom::add_py", mutates_args=[])
def add_py(x: torch.Tensor, y: int) -> torch.Tensor:
    return x + y

@add_py.register_fake
def _(x: torch.Tensor, y: int) -> torch.Tensor:
    return torch.empty((y,), dtype=torch.float32)

@torch.compile(backend="eager", fullgraph=True)
def f(x):
    # return torch.ops.custom.add_py(x, x.shape[0]) # passes
    return torch.ops.custom.add_cpp(x, x.shape[0]) # errors with `Not all values of RelaxedUnspecConstraint(L['x'].size()[0]) are valid because L['x'].size()[0] was inferred to be a constant (2).`

x = torch.ones(2, 4)
torch._dynamo.mark_dynamic(x, 0)
print(f(x)[0])

when we register the custom op from c++ side, dynamic shape will be directly specialized to an integer, and fail.
when we register the custom op from Python side, dynamic shape works as expected.

we should change the way we register quantization as custom ops, from c++ side to python side.

there's also one complicated object

vllm/vllm/scalar_type.py

Line 15 in f3a507f

class scalar_types:

that appears in the custom op parameter :

vllm/vllm/_custom_ops.py

Lines 315 to 321 in f3a507f

    
           @register_fake("_C::gptq_marlin_24_gemm") 
        
           def _gptq_marlin_24_gemm_fake(a: torch.Tensor, b_q_weight: torch.Tensor, 
        
                                         b_meta: torch.Tensor, b_scales: torch.Tensor, 
        
                                         workspace: torch.Tensor, 
        
                                         b_q_type: ScalarType, size_m: int, 
        
                                         size_n: int, size_k: int) -> torch.Tensor: 
        
               return torch.empty((size_m, size_n), device=a.device, dtype=a.dtype)

we can use strings to represent the type, and look up the actual object to pass into the c++ function.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

youkaichao · 2024-10-10T08:00:25Z

cc @bnellnm

youkaichao · 2024-10-10T08:04:15Z

there's a related issue in pytorch pytorch/pytorch#112883 , and the comment seems to be that pytorch will not fix it in the near future.

I tested it in pytorch nightly (2.6.0.dev20241004) , it still has this problem.

bnellnm · 2024-10-10T18:37:51Z

I was able to workaround the problem by modifying the schemas to take SymInts. I'll look into the scalar_type issue.

import torch
from torch.utils.cpp_extension import load_inline

custom_library = torch.library.Library("custom", "DEF")
custom_library.define("add_cpp(Tensor x, SymInt y) -> Tensor")

cpp_source = """                                                                                                                                    
#include <torch/extension.h>                                                                                                                        
                                                                                                                                                    
torch::Tensor custom_add(torch::Tensor x, int64_t y) {                                                                                              
    return x + y;                                                                                                                                   
}                                                                                                                                                   
                                                                                                                                                    
TORCH_LIBRARY_IMPL(custom, CPU, m) {                                                                                                                
    m.impl("add_cpp", custom_add);                                                                                                                  
}                                                                                                                                                   
"""

custom_op = load_inline(
    name="custom_op",
    cpp_sources=cpp_source,
    extra_cflags=[],
    functions=["custom_add"]
)

@torch.library.register_fake("custom::add_cpp")
def _(x: torch.Tensor, y: torch.SymInt) -> torch.Tensor:
    return torch.empty((y,), dtype=torch.float32)

import torch

@torch.library.custom_op("custom::add_py", mutates_args=[])
def add_py(x: torch.Tensor, y: int) -> torch.Tensor:
    return x + y

@add_py.register_fake
def _(x: torch.Tensor, y: int) -> torch.Tensor:
    return torch.empty((y,), dtype=torch.float32)

@torch.compile(backend="eager", fullgraph=True)
def f(x):
    # return torch.ops.custom.add_py(x, x.shape[0]) # passes                                                                                        
    return torch.ops.custom.add_cpp(x, x.shape[0]) # errors with `Not all values of RelaxedUnspecConstraint(L['x'].size()[0]) are valid because L['x'].size()[0] was inferred to be a constant (2).`                                                                                                   

x = torch.ones(2, 4)
torch._dynamo.mark_dynamic(x, 0)
print(f(x)[0])

bnellnm · 2024-10-10T20:14:08Z

This also works.

custom_library = torch.library.Library("custom", "DEF")

cpp_source = """                                                                                                                                    
#include <torch/extension.h>                                                                                                                        
                                                                                                                                                    
torch::Tensor custom_add(torch::Tensor x, int64_t y) {                                                                                              
    return x + y;                                                                                                                                   
}                                                                                                                                                   
                                                                                                                                                    
TORCH_LIBRARY_FRAGMENT(custom, m)                                                                                                                   
{                                                                                                                                                   
    m.def("add_cpp(Tensor x, SymInt y) -> Tensor");                                                                                                 
    m.impl("add_cpp", torch::kCPU, custom_add);                                                                                                     
}                                                                                                                                                   
"""
            
custom_op = load_inline(
    name="custom_op",
    cpp_sources=cpp_source,
    extra_cflags=[],
    functions=["custom_add"]
)

@torch.library.register_fake("custom::add_cpp")
def _(x: torch.Tensor, y: torch.SymInt) -> torch.Tensor:
    return torch.empty((y,), dtype=torch.float32)

I think the ScalarType problem is orthogonal to the SymInt problem.

youkaichao · 2024-10-10T21:02:49Z

I think the ScalarType problem is orthogonal to the SymInt problem.

yes, they are two separate problems. for dynamo dynamic shape to understand quantization ops, both problems need to be solved.

youkaichao added the misc label Oct 10, 2024

bnellnm mentioned this issue Oct 12, 2024

[Bugfix] Fix support for dimension like integers and ScalarType #9299

Merged

tlrmchlsmth closed this as completed in #9299 Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[issue tracker] make quantization compatible with dynamo dynamic shape #9234

[issue tracker] make quantization compatible with dynamo dynamic shape #9234

youkaichao commented Oct 10, 2024

youkaichao commented Oct 10, 2024

youkaichao commented Oct 10, 2024

bnellnm commented Oct 10, 2024

bnellnm commented Oct 10, 2024

youkaichao commented Oct 10, 2024

[issue tracker] make quantization compatible with dynamo dynamic shape #9234

[issue tracker] make quantization compatible with dynamo dynamic shape #9234

Comments

youkaichao commented Oct 10, 2024

Anything you want to discuss about vllm.

Before submitting a new issue...

youkaichao commented Oct 10, 2024

youkaichao commented Oct 10, 2024

bnellnm commented Oct 10, 2024

bnellnm commented Oct 10, 2024

youkaichao commented Oct 10, 2024