inital-data-point-selection

Codebase for the A Data-Driven Solution for the Cold Start Problem in Biomedical Image Classification paper submitted to ISBI 2024.

Structure

src/ contains all the reusable source code.
- src/datasets/data/ contains all the image datasets with a train/, val/, and test/ subfolders.
- src/models/data/ contains all the registered models, precomputed features, and metadata.
tasks/ are all the scripts that use the source code to generate results saved in outputs/.
Hydra is used for configurations, which are saved in conf/.
Weights & Biases is used for tracking runs and as a storage of results.

Set-up

The code has been tested to work well with Python 3.10.12.

pip install -r requirements.txt
Put your Weights & Biases API key in config.ini.
Run python -m tasks.datasets.download -d matek -d isic -d retinopathy -d jurkat to download all the used datasets.

Encoder Pretraining

To train SimCLR for the matek dataset run:

python -m tasks.training.train_simclr dataset=matek

To train SwAV for the matek dataset run:

python -m tasks.training.train_swav dataset=matek

To train DINO for the matek dataset run:

python -m tasks.training.train_dino dataset=matek

To register model's weights after training run the following:

python -m tasks.backups.add_new_model \
    --path=[path to model weights in lightning_logs/] \
    --type=[simclr/swav/dino] \
    --version=[v1/v2/...] \
    --dataset=[dataset name]

Feature Extraction & Precomputation

To extract features from a self-supervised model run the following:

python -m tasks.inference.extract_features \
    dataset=[dataset name] \
    training.weights.type=[simclr/swav/dino] \
    training.weights.version=[v1/v2/...]

To precompute features from a self-supervised model with a number of random augmentations for each image run the following:

python -m tasks.inference.precompute_features \
    dataset=[dataset name] \
    training.weights.type=[simclr/swav/dino] \
    training.weights.version=[v1/v2/...] \
    num_augmentations=[number of random augmentations for each image]

This allows us to save a lot of computation time in the Classifier Training step, since we freeze the weights of the encoder and train only the classifier head.

Classifier Training

To train the classifier head(s) and aggregate them into a final model using Model Soups run the following:

python -m tasks.training.linear_classifier \
    dataset=[dataset name] \
    training.weights.type=[simclr/swav/dino] \
    training.weights.version=[v1/v2/...] \
    training.learning_rate=null

Feel free to tinker with the features, kmeans, and soup configurations.

To train the classifier head using all of the annotated data with proper hyperparameter selection using validation set run the following:

python -m tasks.training.linear_classifier_ceiling \
    dataset=[dataset name] \
    training.weights.type=[simclr/swav/dino] \
    training.weights.version=[v1/v2/...] \
    training.learning_rate=null \
    training.train_samples=null

Visualisations

Check the tasks/vis/ for all the scripts to visualise the results.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
assets/images		assets/images
conf		conf
explorations		explorations
scripts		scripts
src		src
tasks		tasks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

inital-data-point-selection

Structure

Set-up

Encoder Pretraining

Feature Extraction & Precomputation

Classifier Training

Visualisations

About

Releases

Packages

Contributors 2

Languages

License

marrlab/initial-data-point-selection

Folders and files

Latest commit

History

Repository files navigation

inital-data-point-selection

Structure

Set-up

Encoder Pretraining

Feature Extraction & Precomputation

Classifier Training

Visualisations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages