Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Official repository for the AAAI2025 paper Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer [paper] [website].

In summary, SparseViT leverages the distinction between semantic and non-semantic features, enabling the model to adaptively extract non-semantic features that are more critical for image manipulation localization. This provides a novel approach to precisely identifying manipulated regions.

Test setup (Code + Models)

1) Set up the coding environment

First, clone the repository:

git clone https://github.com/scu-zjz/SparseViT.git

Our environment

Ubuntu LTS 20.04.1

CUDA 11.5 + cudnn 8.4.0

Python 3.10

PyTorch 2.4

You should install the packages in requirements.txt

pip install -r requirements.txt

2) Download our pretrained checkpoints

Download our pretrained checkpoints from Google Drive and place them in the checkpoint directory.

Scripts

This should be super easy! Simply run

python main_test.py

Here, we have simply provided a basic test of SparseViT. Of course, you can train and test SparseViT within our proposed IMDL-BenCo framework, as they are fully compatible.

Citation

If you find our code useful, please consider citing us and give us a star!

@misc{su2024can,
  title={Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer},
  author={Su, Lei and Ma, Xiaochen and Zhu, Xuekang and Niu, Chaoqun and Lei, Zeyu and Zhou, Ji-Zhe},
  year={2024},
  eprint={2412.14598},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Test setup (Code + Models)

Scripts

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Test setup (Code + Models)

Scripts

Citation