Skip to content

Breast cancer recurrence prediction using longitudinal EHR data

Notifications You must be signed in to change notification settings

joshsanyal/breast-cancer-recurrence-pred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Longitudinal Breast Cancer Recurrence Prediction

Breast cancer is the most common cancer in women globally and the fifth leading cause of cancer mortality worldwide. For the 3.5 million breast cancer survivors in the US, there is an almost 30% probability of recurrence, of which only 1-1.5% are potentially curable due to late detection. Earlier prediction of breast cancer recurrence could improve patient survival, increase quality of care, and save medical resources. Previous studies have attempted to predict recurrences using structured electronic health records, but lack of standardization in recurrence surveillance and missing EHR data makes such models unsuitable in clinical settings. While widely-available, free-text clinic notes may offer the greatest nuance and detail about a patient’s clinical status, they are largely unexplored due to the challenge of representing free-text data for computational analysis. In this study, I present the first attempt to predict breast cancer recurrence, 1 year in advance, by leveraging unstructured clinical narratives in EHR over a patient’s temporal visit sequence. To deal with the noisy, unstructured nature of free-text notes, I propose a weighted vectorization scheme to represent clinical narratives without supervision and train it on 92.6 million notes across 3 institutions. The recurrence prediction model is then trained jointly on manually curated data from 670 patients and weakly labeled data from 8,062 patients, achieving 0.94 ROC AUC when validated on holdout test patients. This weak supervision approach provides improved accuracy with less manual effort, by enabling the usage of realistic clinical datasets where cancer recurrence occurs in a minority of patients. The model’s longitudinal approach also allows for visualization of predicted recurrence probabilities over time, helping clinicians rationalize its outcomes, incorporate them into patient treatment, and potentially prevent recurrences.

About

Breast cancer recurrence prediction using longitudinal EHR data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages