GitHub - RichardObi/mammo_dp: Official repository of "Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data"

Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data

In MICCAI 2024 Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care.

Getting Started

Datasets

If you prefer to directly use our processed dataset consisting of extracted malignant and benign masses, you can find our train, validation, and test dataset in dataset16062024.

If you would like to setup your own data processing pipeline, you can find the CBIS-DDSM Dataset used in this study on The Cancer Imaging Archive (TCIA). The Breast Cancer Digital Repository (BCDR) Dataset, which was used as external test set in this study, is available upon request at the BCDR Website.

Synthetic Data

You can find the synthetic data used in this study in the folder extension/synthetic_data/cbis-ddsm.

If you would prefer to generate your own synthetic data using our MCGAN model, you can do so via the medigan library, which loads the model weights used in this study from Zenodo and generates malignant and benign masses.

To generate the masses, simply run:

pip install medigan

# import medigan and initialize Generators
from medigan import Generators
generators = Generators()

# generate 1000 samples with model 8 (00008_C-DCGAN_MMG_MASSES). 
# Also, auto-install required model dependencies.
generators.generate(model_id='00008_C-DCGAN_MMG_MASSES', num_samples=1000, install_dependencies=True)

Running Experiments

Classification Code

Script to create an environment and run all experiments reported in the paper.
Configs to run the different swin transformer experiments.
Config description excel file explaining the different dbr experiments alongside the respective experimental results.
Code to train, validate and test our swin transformer classification model with or without differentially-private stochastic gradient descent.
CBIS-DDSM Train-test-splits and BCDR external testset. Final dataset with splits is also available here.
Paths to the original datasets after downloading them locally.

Synthesis Code

Script to create an environment and train the Malignancy-Conditioned GAN (MCGAN) e.g. used to then create the synthetic data reported in the paper.
Config to define the setup and hyperparameters for a MCGAN training run.
Code to start an MCGAN training run.
Code and Checkpoint that can be used for local setup to run inference of MCGAN (by running the __ init__.py file).
FRD metric used in the paper to evaluate the synthetic data based on radiomics imaging biomarker variability between real and synthetic image distributions.

Summary

Reference

Please consider citing our work if you found it useful for your research:

@article{osuala2024enhancing,
  title={{Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data}},
  author={Richard Osuala and Daniel M. Lang and Anneliese Riess and Georgios Kaissis and Zuzanna Szafranowska and Grzegorz Skorupko and Oliver Diaz and Julia A. Schnabel and Karim Lekadir},
  journal={arXiv preprint arXiv:2407.12669},
  url={https://arxiv.org/abs/2407.12669},
  year={2024}
  }

Acknowledgements

This repository borrows and extends the code from the mammo_gans repository.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset16062024		dataset16062024
docs		docs
extension/synthetic_data/cbis-ddsm/c-dcgan		extension/synthetic_data/cbis-ddsm/c-dcgan
gan_compare		gan_compare
setup		setup
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
dbr.sh		dbr.sh
gan.sh		gan.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data

Getting Started

Datasets

Synthetic Data

Running Experiments

Classification Code

Synthesis Code

Summary

Reference

Acknowledgements

About

Releases

Packages

Languages

License

RichardObi/mammo_dp

Folders and files

Latest commit

History

Repository files navigation

Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data

Getting Started

Datasets

Synthetic Data

Running Experiments

Classification Code

Synthesis Code

Summary

Reference

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages