Codebase for the A Data-Driven Solution for the Cold Start Problem in Biomedical Image Classification paper submitted to ISBI 2024.
src/
contains all the reusable source code.src/datasets/data/
contains all the image datasets with atrain/
,val/
, andtest/
subfolders.src/models/data/
contains all the registered models, precomputed features, and metadata.
tasks/
are all the scripts that use the source code to generate results saved inoutputs/
.- Hydra is used for configurations, which are saved in
conf/
. - Weights & Biases is used for tracking runs and as a storage of results.
The code has been tested to work well with Python 3.10.12.
pip install -r requirements.txt
- Put your Weights & Biases API key in
config.ini
. - Run
python -m tasks.datasets.download -d matek -d isic -d retinopathy -d jurkat
to download all the used datasets.
To train SimCLR for the matek
dataset run:
python -m tasks.training.train_simclr dataset=matek
To train SwAV for the matek
dataset run:
python -m tasks.training.train_swav dataset=matek
To train DINO for the matek
dataset run:
python -m tasks.training.train_dino dataset=matek
To register model's weights after training run the following:
python -m tasks.backups.add_new_model \
--path=[path to model weights in lightning_logs/] \
--type=[simclr/swav/dino] \
--version=[v1/v2/...] \
--dataset=[dataset name]
To extract features from a self-supervised model run the following:
python -m tasks.inference.extract_features \
dataset=[dataset name] \
training.weights.type=[simclr/swav/dino] \
training.weights.version=[v1/v2/...]
To precompute features from a self-supervised model with a number of random augmentations for each image run the following:
python -m tasks.inference.precompute_features \
dataset=[dataset name] \
training.weights.type=[simclr/swav/dino] \
training.weights.version=[v1/v2/...] \
num_augmentations=[number of random augmentations for each image]
This allows us to save a lot of computation time in the Classifier Training step, since we freeze the weights of the encoder and train only the classifier head.
To train the classifier head(s) and aggregate them into a final model using Model Soups run the following:
python -m tasks.training.linear_classifier \
dataset=[dataset name] \
training.weights.type=[simclr/swav/dino] \
training.weights.version=[v1/v2/...] \
training.learning_rate=null
Feel free to tinker with the features
, kmeans
, and soup
configurations.
To train the classifier head using all of the annotated data with proper hyperparameter selection using validation set run the following:
python -m tasks.training.linear_classifier_ceiling \
dataset=[dataset name] \
training.weights.type=[simclr/swav/dino] \
training.weights.version=[v1/v2/...] \
training.learning_rate=null \
training.train_samples=null
Check the tasks/vis/
for all the scripts to visualise the results.