-
Origin Repo:naver-ai/pit
-
Code:pit.py
-
Evaluate Transforms:
# backend: pil # input_size: 224x224 transforms = T.Compose([ T.Resize(248, interpolation='bicubic'), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
-
Model Details:
Model Model Name Params (M) FLOPs (G) Top-1 (%) Top-5 (%) Pretrained Model PiT-Ti pit_ti 4.9 0.7 72.91 91.40 Download PiT-XS pit_xs 10.6 1.4 78.18 94.16 Download PiT-S pit_s 23.5 2.9 81.08 95.33 Download PiT-B pit_b 73.8 12.5 82.44 95.71 Download PiT-Ti distilled pit_ti_distilled 4.9 0.7 74.54 92.10 Download PiT-XS distilled pit_xs_distilled 10.6 1.4 79.31 94.36 Download PiT-S distilled pit_s_distilled 23.5 2.9 81.99 95.79 Download PiT-B distilled pit_b_distilled 73.8 12.5 84.14 96.86 Download
-
Citation:
@article{heo2021pit, title={Rethinking Spatial Dimensions of Vision Transformers}, author={Byeongho Heo and Sangdoo Yun and Dongyoon Han and Sanghyuk Chun and Junsuk Choe and Seong Joon Oh}, journal={arXiv: 2103.16302}, year={2021}, }