ConvNext without Layer Norm #1429

bonlime · 2022-08-23T15:41:04Z

bonlime
Aug 23, 2022

Hi!
According to the table in ConvNext paper Layer Norm gives them only a minor boost of +0.1% which is quite insignificant, while it hurts the inference speed. Have you tried training models with BN instead? Or what do you think about removing the LN from ConvNext? From your code I see that such modifications are already supported.

Answered by rwightman

Sep 26, 2023

@pjvanbeek worth checking out InceptionNeXt ... it's batchnorm by default and based on ConvNeXt. Has 3x3 + 11x1 + 1x11 conv (all DW) branches as the token mixing operation instead of 7x7 DW of ConvNeXt.

https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/inception_next.py

View full answer

rwightman · 2022-08-23T22:01:35Z

rwightman
Aug 23, 2022
Maintainer

I feel LN is a key benefit of ConNeXt. It's nice we have options that don't require BatchNorm to perform well, and indeed they appear to have improved robustness...

As you've noted, support is there, some benchmark numbers show it's a clear win in channels_last on recent GPU (3090) but not a win without...

@register_model
def convnext_tiny_bn(pretrained=False, **kwargs):
    model_args = dict(depths=(3, 3, 9, 3), dims=(96, 192, 384, 768), norm_layer=nn.BatchNorm2d, conv_mlp=True, **kwargs)
    model = _create_convnext('convnext_tiny', pretrained=pretrained, **model_args)
    return model

python benchmark.py --amp --model convnext_tiny_bn --img-size 224
...
{
    "model": "convnext_tiny_bn",
    "infer_samples_per_sec": 1759.13,
    "infer_step_time": 145.503,
    "infer_batch_size": 256,
    "infer_img_size": 224,
    "infer_gmacs": 4.46,
    "infer_macts": 13.44,
    "train_samples_per_sec": 547.1,
    "train_step_time": 466.876,
    "train_batch_size": 256,
    "train_img_size": 224,
    "param_count": 28.59
}

python benchmark.py --amp --model convnext_tiny_bn --img-size 224 --channels-last
...
{
    "model": "convnext_tiny_bn",
    "infer_samples_per_sec": 2805.93,
    "infer_step_time": 91.212,
    "infer_batch_size": 256,
    "infer_img_size": 224,
    "infer_gmacs": 4.46,
    "infer_macts": 13.44,
    "train_samples_per_sec": 968.89,
    "train_step_time": 263.252,
    "train_batch_size": 256,
    "train_img_size": 224,
    "param_count": 28.59
}

python benchmark.py --amp --model convnext_tiny --img-size 224
...
{
    "model": "convnext_tiny",
    "infer_samples_per_sec": 2247.09,
    "infer_step_time": 113.902,
    "infer_batch_size": 256,
    "infer_img_size": 224,
    "infer_gmacs": 4.47,
    "infer_macts": 13.44,
    "train_samples_per_sec": 783.96,
    "train_step_time": 325.467,
    "train_batch_size": 256,
    "train_img_size": 224,
    "param_count": 28.59
}

python benchmark.py --amp --model convnext_tiny --img-size 224  --channels-last
...
{
    "model": "convnext_tiny",
    "infer_samples_per_sec": 2262.32,
    "infer_step_time": 113.135,
    "infer_batch_size": 256,
    "infer_img_size": 224,
    "infer_gmacs": 4.47,
    "infer_macts": 13.44,
    "train_samples_per_sec": 797.53,
    "train_step_time": 319.906,
    "train_batch_size": 256,
    "train_img_size": 224,
    "param_count": 28.59
}

1 reply

bonlime Aug 24, 2022
Author

But this numbers are for un-fused versions, fusing would make it clearly faster. The only reason I ask about BN is because it's free on inference.

rwightman · 2022-08-23T22:17:59Z

rwightman
Aug 23, 2022
Maintainer

I've got a bit o magic on a dev branch (more_vit) I'll be merging soon...

install APEX (for APEX LN, although it does improve without APEX as well)
enable fast norm https://github.com/rwightman/pytorch-image-models/blob/more_vit/timm/models/layers/fast_norm.py#L14

and then...

with conv_mlp=True to the convnext model args...

python benchmark.py --amp --model convnext_tiny --img-size 224  --channels-last
{
    "model": "convnext_tiny",
    "infer_samples_per_sec": 2628.61,
    "infer_step_time": 97.371,
    "infer_batch_size": 256,
    "infer_img_size": 224,
    "infer_gmacs": 4.46,
    "infer_macts": 13.44,
    "train_samples_per_sec": 919.88,
    "train_step_time": 277.358,
    "train_batch_size": 256,
    "train_img_size": 224,
    "param_count": 28.59
}

without conv_mlp=True

python benchmark.py --amp --model convnext_tiny --img-size 224
{
    "model": "convnext_tiny",
    "infer_samples_per_sec": 2566.11,
    "infer_step_time": 99.745,
    "infer_batch_size": 256,
    "infer_img_size": 224,
    "infer_gmacs": 4.46,
    "infer_macts": 13.44,
    "train_samples_per_sec": 916.1,
    "train_step_time": 278.468,
    "train_batch_size": 256,
    "train_img_size": 224,
    "param_count": 28.59
}

1 reply

bonlime Aug 24, 2022
Author

Nice hack. I didn't know about existence of apex.normalization.fused_layer_norm.fused_layer_norm_affine

rwightman · 2022-08-23T22:19:04Z

rwightman
Aug 23, 2022
Maintainer

FYI, I trained all of the atto/pico/nano models with the fast LN (prevent upcasting to float32) enabled, so the precision difference appears to have no impact there, but not sure if it will cause issues at larger scale. Although appears to have no issues for inference...

0 replies

pjvanbeek · 2023-09-26T18:14:23Z

pjvanbeek
Sep 26, 2023

Just wanted to express support for the idea of have a ConvNext version with BN. Would be beneficial for running on efficient inference hardware.

0 replies

rwightman · 2023-09-26T19:44:48Z

rwightman
Sep 26, 2023
Maintainer

@pjvanbeek worth checking out InceptionNeXt ... it's batchnorm by default and based on ConvNeXt. Has 3x3 + 11x1 + 1x11 conv (all DW) branches as the token mixing operation instead of 7x7 DW of ConvNeXt.

https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/inception_next.py

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConvNext without Layer Norm #1429

{{title}}

Replies: 5 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

ConvNext without Layer Norm #1429

bonlime Aug 23, 2022

Replies: 5 comments · 2 replies

rwightman Aug 23, 2022 Maintainer

bonlime Aug 24, 2022 Author

rwightman Aug 23, 2022 Maintainer

bonlime Aug 24, 2022 Author

rwightman Aug 23, 2022 Maintainer

pjvanbeek Sep 26, 2023

rwightman Sep 26, 2023 Maintainer

bonlime
Aug 23, 2022

Replies: 5 comments 2 replies

rwightman
Aug 23, 2022
Maintainer

bonlime Aug 24, 2022
Author

rwightman
Aug 23, 2022
Maintainer

bonlime Aug 24, 2022
Author

rwightman
Aug 23, 2022
Maintainer

pjvanbeek
Sep 26, 2023

rwightman
Sep 26, 2023
Maintainer