Skip to content

deep-spin/SparseModernBERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparse ModernBERT

This repo contains the code used for the experiments with ModernBERT in the AdaSplash paper: https://arxiv.org/abs/2502.12082

Installation

Check the steps in the original ModernBERT repo.

Models on Huggingface

Training

Check the scripts:

  • Pretrain ModernBERT on MLM: train_modernbert.sh
  • Finetune on recall tasks: examples/run_st.sh

Evaluating

Load the model:

from transformers import AutoTokenizer
from sparse_modern_bert import CustomModernBertModel

model_id = "sardinelab/SparseModernBERT-alpha1.5"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = CustomModernBertModel.from_pretrained(model_id, trust_remote_code=True)

An example is provided in examples/evaluate_st_sparse.py.

AdaSplash

AdaSplash is an efficient adaptive sparse attention mechanism implemented in Triton. See repo: https://github.com/deep-spin/adasplash

Reference

@article{goncalves2025adasplash,
  title={AdaSplash: Adaptive Sparse Flash Attention},
  author={Nuno Gonçalves and Marcos Treviso and André F. T. Martins},
  journal={arXiv preprint arXiv:2502.12082},
  url={https://arxiv.org/abs/2502.12082},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published