Skip to content

Latest commit

 

History

History
51 lines (43 loc) · 3.65 KB

File metadata and controls

51 lines (43 loc) · 3.65 KB

PiT

  • Paper:Rethinking Spatial Dimensions of Vision Transformers

  • Origin Repo:naver-ai/pit

  • Code:pit.py

  • Evaluate Transforms:

    # backend: pil
    # input_size: 224x224
    transforms = T.Compose([
        T.Resize(248, interpolation='bicubic'),
        T.CenterCrop(224),
        T.ToTensor(),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
  • Model Details:

    Model Model Name Params (M) FLOPs (G) Top-1 (%) Top-5 (%) Pretrained Model
    PiT-Ti pit_ti 4.9 0.7 72.91 91.40 Download
    PiT-XS pit_xs 10.6 1.4 78.18 94.16 Download
    PiT-S pit_s 23.5 2.9 81.08 95.33 Download
    PiT-B pit_b 73.8 12.5 82.44 95.71 Download
    PiT-Ti distilled pit_ti_distilled 4.9 0.7 74.54 92.10 Download
    PiT-XS distilled pit_xs_distilled 10.6 1.4 79.31 94.36 Download
    PiT-S distilled pit_s_distilled 23.5 2.9 81.99 95.79 Download
    PiT-B distilled pit_b_distilled 73.8 12.5 84.14 96.86 Download
  • Citation:

    @article{heo2021pit,
        title={Rethinking Spatial Dimensions of Vision Transformers},
        author={Byeongho Heo and Sangdoo Yun and Dongyoon Han and Sanghyuk Chun and Junsuk Choe and Seong Joon Oh},
        journal={arXiv: 2103.16302},
        year={2021},
    }