-
Paper:Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
-
Origin Repo:whai362/PVT
-
Code:pvt.py
-
Evaluate Transforms:
# backend: pil # input_size: 224x224 transforms = T.Compose([ T.Resize(248, interpolation='bicubic'), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ])
-
Model Details:
Model Model Name Params (M) FLOPs (G) Top-1 (%) Top-5 (%) Pretrained Model PVT-Tiny pvt_ti 13.2 1.9 74.96 92.47 Download PVT-Small pvt_s 24.5 3.8 79.87 95.05 Download PVT-Medium pvt_m 44.2 6.7 81.48 95.75 Download PVT-Large pvt_l 61.4 9.8 81.74 95.87 Download
-
Citation:
@misc{wang2021pyramid, title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions}, author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao}, year={2021}, eprint={2102.12122}, archivePrefix={arXiv}, primaryClass={cs.CV} }