Skip to content

Latest commit

 

History

History
74 lines (49 loc) · 2.39 KB

README.md

File metadata and controls

74 lines (49 loc) · 2.39 KB

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Official repository for the AAAI2025 paper Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer [paper] [website].

SparseViT Framework Diagram

In summary, SparseViT leverages the distinction between semantic and non-semantic features, enabling the model to adaptively extract non-semantic features that are more critical for image manipulation localization. This provides a novel approach to precisely identifying manipulated regions.

Test setup (Code + Models)

1) Set up the coding environment
  • First, clone the repository:
git clone https://github.com/scu-zjz/SparseViT.git
  • Our environment
Ubuntu LTS 20.04.1

CUDA 11.5 + cudnn 8.4.0

Python 3.10

PyTorch 2.4
pip install -r requirements.txt
2) Download our pretrained checkpoints
  • Download our pretrained checkpoints from Google Drive and place them in the checkpoint directory.

Scripts

This should be super easy! Simply run

python main_test.py

Here, we have simply provided a basic test of SparseViT. Of course, you can train and test SparseViT within our proposed IMDL-BenCo framework, as they are fully compatible.

Citation

If you find our code useful, please consider citing us and give us a star!

@misc{su2024can,
  title={Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer},
  author={Su, Lei and Ma, Xiaochen and Zhu, Xuekang and Niu, Chaoqun and Lei, Zeyu and Zhou, Ji-Zhe},
  year={2024},
  eprint={2412.14598},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}