Skip to content

Latest commit

 

History

History
87 lines (62 loc) · 2.65 KB

README.md

File metadata and controls

87 lines (62 loc) · 2.65 KB

Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

This repository contains PyTorch implementation for Sparsifiner (CVPR 2023).

[Project Page] [arXiv (CVPR 2023)]

Usage

Requirements

  • torch>=1.8.1
  • torchvision>=0.9.1
  • timm==0.3.2
  • tensorboardX
  • six
  • fvcore

Data preparation: download and extract ImageNet images from http://image-net.org/. The directory structure should be

│ILSVRC2012/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Model preparation: download pre-trained models if necessary:

model url model url
DeiT-Small link LVViT-S link
DeiT-Base link LVViT-M link

Training

To train a Sparsifiner model with default configuration on ImageNet, run:

Sparsifiner-S

Train on 8 GPUs

bash run_model.sh --IMNET sparsifiner_default 8

License

MIT License

Acknowledgements

Our code is based on DynamicVit, pytorch-image-models, DeiT, LV-ViT

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Wei_2023_CVPR,
    author    = {Wei, Cong and Duke, Brendan and Jiang, Ruowei and Aarabi, Parham and Taylor, Graham W. and Shkurti, Florian},
    title     = {Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {22680-22689}
}