-
Hi! |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 2 replies
-
I feel LN is a key benefit of ConNeXt. It's nice we have options that don't require BatchNorm to perform well, and indeed they appear to have improved robustness... As you've noted, support is there, some benchmark numbers show it's a clear win in channels_last on recent GPU (3090) but not a win without...
|
Beta Was this translation helpful? Give feedback.
-
I've got a bit o magic on a dev branch (more_vit) I'll be merging soon...
and then... with
without
|
Beta Was this translation helpful? Give feedback.
-
FYI, I trained all of the atto/pico/nano models with the fast LN (prevent upcasting to float32) enabled, so the precision difference appears to have no impact there, but not sure if it will cause issues at larger scale. Although appears to have no issues for inference... |
Beta Was this translation helpful? Give feedback.
-
Just wanted to express support for the idea of have a ConvNext version with BN. Would be beneficial for running on efficient inference hardware. |
Beta Was this translation helpful? Give feedback.
-
@pjvanbeek worth checking out InceptionNeXt ... it's batchnorm by default and based on ConvNeXt. Has 3x3 + 11x1 + 1x11 conv (all DW) branches as the token mixing operation instead of 7x7 DW of ConvNeXt. https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/inception_next.py |
Beta Was this translation helpful? Give feedback.
@pjvanbeek worth checking out InceptionNeXt ... it's batchnorm by default and based on ConvNeXt. Has 3x3 + 11x1 + 1x11 conv (all DW) branches as the token mixing operation instead of 7x7 DW of ConvNeXt.
https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/inception_next.py