Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

🐛 [Bug] non-contiguous tensor with GroupNorm's affine=False causes shape change error #3324

Open
Jason3900 opened this issue Dec 15, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@Jason3900
Copy link

Bug Description

non-contiguous tensor with GroupNorm's affine=False causes the following error:

DEBUG:torch_tensorrt.dynamo.conversion._TRTInterpreter:Converting node norm_fn/native_group_norm (kind: aten.native_group_norm.default, args: ('clone <Node>', 'None <NoneType>', 'None <NoneType>', '5 <int>', '512 <int>', '256 <int>', '32 <int>', '1e-05 <float>'))
ERROR:torch_tensorrt [TensorRT Conversion Context]:ITensor::getDimensions: Error Code 4: Shape Error (reshape changes volume. Reshaping [1] to [1,512,1,1].)
ERROR:torch_tensorrt [TensorRT Conversion Context]:ITensor::getDimensions: Error Code 4: API Usage Error (Output shape can not be computed for node [SHUFFLE]-[aten_ops.native_group_norm.default]-[norm_fn/native_group_norm_reshape_gamma].)

To Reproduce

Steps to reproduce the behavior:

class GroupNormSpatial(nn.Module):
    """GroupNorm with spatial dimensions ignored."""
    def __init__(self, num_groups, num_channels, epsilon: float = 1e-5, affine=True):
        super().__init__()
        self.norm_fn = nn.GroupNorm(num_groups=num_groups, num_channels=num_channels, eps=epsilon, affine=affine)

    def forward(self, inputs: torch.Tensor) -> torch.Tensor:

        inputs = rearrange(inputs, "b c t h w -> (b t) c h w") # this will produce non-contiguous tensor, add add .contiguous() doesn't work when compiling torch_tensorrt model
        out = self.norm_fn(inputs) # this will raise the error above
        out = rearrange(out, "(b t) c h w -> b c t h w", b=b, t=t)
        return out


model = GroupNormSpatial(num_groups=32, num_channels=512, affine=False)

trt_model = torch_tensorrt.compile(model,
      inputs= [torch_tensorrt.Input((1, 512, 5, 16, 16))],
      debug=True,
      ir="dynamo",
      enabled_precisions= {torch.float32},
      make_refitable=True
)

Environment

I use ngc torch image: nvcr.io/nvidia/pytorch:24.10-py3
torch_tensorrt=2.5.0a0

@Jason3900 Jason3900 added the bug Something isn't working label Dec 15, 2024
@apbose
Copy link
Collaborator

apbose commented Jan 15, 2025

Taking a look

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants