Skip to content

silicx/LoRS_Distill

Repository files navigation

LoRS: Low-Rank Similarity Mining

This repo contains code of our ICML'24 work LoRS: Low-Rank Similarity Mining for Multimodal Dataset Distillation. LoRS propose to learn the similarity matrix during distilling the image and text. The simple and plug-and-play method yields significant performance gain. Please check our paper for more analysis.

Method

Getting Started

Requirements: please see requirements.txt.

Pretrained model checkpoints: you may manually download checkpoint of BERT, NFNet (from TIMM) and put them here:

distill_utils/checkpoints/
├── bert-base-uncased/
│   ├── config.json
│   ├── LICENSE.txt
│   ├── model.onnx
│   ├── pytorch_model.bin
│   ├── vocab.txt
│   └── ......
└── nfnet_l0_ra2-45c6688d.pth

Datasets: please download Flickr30K: [Train][Val][Test][Images] and COCO: [Train][Val][Test][Images] datasets, and put them here:

./distill_utils/data/
├── Flickr30k/
│   ├── flickr30k-images/
│   │   ├── 1234.jpg
│   │   └── ......
│   ├── results_20130124.token
│   └── readme.txt
└── COCO/
    ├── train2014/
    ├── val2014/
    └── test2014/

Training Expert Buffer: e.g. run sh sh/buffer_flickr.sh. The expert training takes days. You could manually split the num_experts and run multiple processes.

Distill with LoRS: e.g. run sh sh/distill_flickr_lors_100.sh. The distillation could be run on one single RTX 3090/4090 thanks to TESLA.

Citation

If you find our work useful and inspiring, please cite our paper:

@article{xu2024lors,
  title={Low-Rank Similarity Mining for Multimodal Dataset Distillation},
  author={Xu, Yue and Lin, Zhilin and Qiu, Yusong and Lu, Cewu and Li, Yong-Lu},
  journal={arXiv e-prints},
  pages={arXiv--2406},
  year={2024}
}

Acknowledgement

We following the setting and code of VL-Distill and re-implement the algorithm with TESLA. We deeply appreciate their valuable contribution!

About

Code for our ICML'24 on multimodal dataset distillation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published