This project is part of my MSc Thesis https://pergamos.lib.uoa.gr/uoa/dl/frontend/el/browse/3256221 for my postgraduate studies in Data Science and Information Technologies (NKUA).
In this project, we propose new masking strategies that achieve higher k-NN, linear probing scores and acceleration in the learning process of downstream tasks. Considering the computational efficiency challenge these methods face, we conduct experiments on different scales of a dataset and number of training epochs and show their impact on the scores. Finally, we introduce a new loss function based on contrastive learning and achieve improvements over the baseline when used with different masking strategies.
-
Mask generation from different layers raw attention maps
-
Rollout method for mask generation
-
Pre-processing of attention maps, with log and power functions, for competitive MIM
-
Multi-layer mask generation
To understand better the way each Mask is generated, we suggest taking a look at the Thesis on page 58.
- We conduct all the experiments on ImageNet dataset. During training we use the full train set of Imagenet or a subset (the first 20% of training samples per class).
- For the evaluation of the models, we use the full validation set of ImageNet.
To clone the repo:
git clone https://github.com/DimitrisReppas/MIM_on_self-supervision.git
The requirements of the project can be found here.
-
HPC resources were used from GENCI-IDRIS (Grant 2020-AD011013552).
-
The code is designed to perform distributed parallel processing.
-
All the implementation details can be found: Thesis on page 61
To conduct a training experiment:
sbatch train_example.slurm
Try the proposed Masking Strategies, by changing the following arguments:
--mask
--layer_mask
--power
To train the models with the proposed contrastive term, replace multimask_main_ibot.py
with contrastive_3rd_term_main_ibot.py
in train_example.slurm
.
All the hyperparameters are found: Thesis on page 61
Use k-NN to evaluate the models:
sbatch k-nn_example.slurm
Use linear probing to evaluate the models:
sbatch linear_probing_example.slurm
Bellow, some results of the project are presented:
- Linear probing and k-NN scores of AttMask for different layers (trained on the subset dataset)
- Evaluation of Rollout-based masking strategies with k-NN and linear probing (trained on the subset dataset)
- Linear probing and k-NN evaluation of masking strategies based on the pre-processing of the attention maps with power and log functions (trained on the subset dataset)
- Evaluation of the multi-layer masking and multi-crop strategies with k-NN and linear probing (trained on the subset dataset)
- Evaluation of masking strategies with k-NN and linear probing (trained on full ImageNet)
- A closer look at linear probing plots
- Contrastive Learning results (trained on the subset dataset)
This repository is built using the iBOT repository and inspired by AttMask paper.