In MICCAI 2024 Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care.
If you prefer to directly use our processed dataset consisting of extracted malignant and benign masses, you can find our train, validation, and test dataset in dataset16062024.
If you would like to setup your own data processing pipeline, you can find the CBIS-DDSM Dataset used in this study on The Cancer Imaging Archive (TCIA). The Breast Cancer Digital Repository (BCDR) Dataset, which was used as external test set in this study, is available upon request at the BCDR Website.
You can find the synthetic data used in this study in the folder extension/synthetic_data/cbis-ddsm.
If you would prefer to generate your own synthetic data using our MCGAN model, you can do so via the medigan library, which loads the model weights used in this study from Zenodo and generates malignant and benign masses.
To generate the masses, simply run:
pip install medigan
# import medigan and initialize Generators
from medigan import Generators
generators = Generators()
# generate 1000 samples with model 8 (00008_C-DCGAN_MMG_MASSES).
# Also, auto-install required model dependencies.
generators.generate(model_id='00008_C-DCGAN_MMG_MASSES', num_samples=1000, install_dependencies=True)
- Script to create an environment and run all experiments reported in the paper.
- Configs to run the different swin transformer experiments.
- Config description excel file explaining the different dbr experiments alongside the respective experimental results.
- Code to train, validate and test our swin transformer classification model with or without differentially-private stochastic gradient descent.
- CBIS-DDSM Train-test-splits and BCDR external testset. Final dataset with splits is also available here.
- Paths to the original datasets after downloading them locally.
- Script to create an environment and train the Malignancy-Conditioned GAN (MCGAN) e.g. used to then create the synthetic data reported in the paper.
- Config to define the setup and hyperparameters for a MCGAN training run.
- Code to start an MCGAN training run.
- Code and Checkpoint that can be used for local setup to run inference of MCGAN (by running the __ init__.py file).
- FRD metric used in the paper to evaluate the synthetic data based on radiomics imaging biomarker variability between real and synthetic image distributions.
Please consider citing our work if you found it useful for your research:
@article{osuala2024enhancing,
title={{Enhancing the Utility of Privacy-Preserving Cancer Classification using Synthetic Data}},
author={Richard Osuala and Daniel M. Lang and Anneliese Riess and Georgios Kaissis and Zuzanna Szafranowska and Grzegorz Skorupko and Oliver Diaz and Julia A. Schnabel and Karim Lekadir},
journal={arXiv preprint arXiv:2407.12669},
url={https://arxiv.org/abs/2407.12669},
year={2024}
}
This repository borrows and extends the code from the mammo_gans repository.