Contains notebooks with written step-by-step commentary. DATA IS NOT ALLOWED TO BE SHARED SO FILES NEEDED TO REPRODUCE FINDINGS ARE NOT INCLUDED! Look at the file below titled "UK Biobank Data Mining" for code to create the data files needed for the analysis.
All dataframes and text files which we import in our notebooks to create our final cohorts and conduct analyses using these patients can be requested as we are not allowed to distribute UKBB data.
All code used to conduct the GWAS for the seven pre-diabetic groups. The output is analyzed further in the jupyter notebook below titled "Analysis of GWAS.ipynb"
Contains the analysis after the GWAS was performed.
Contains all the data mining, manipulation, and patient selection for the final cohort to be used in the study.
Contains the age, sex, ethnicity, number, and statistical testing for these variables for progressors vs. non-progressors and pre-diabetic unsupervised clusters.
Contains all figures created manually.
Pre-diabetic Clustering and Statistical Analyses of Metabolomics, Comorbidities, Progression Outcomes, and Traditional Risk Factors.ipynb
Contains the actual clustering and numerous statistical analyses for the metabolomics, comorbidities, progression outcomes, and traditional risk factors.
Contains the method for randomly assigning patients to 7 groups and all statistical analyses conducted for these groups.
Contains all analyses for the supervised model created to predict a pre-diabetic assignment into their assigned group from the unsupervised clustering using the metabolomics data.
Contains code used to mine for the data we need to create our final patient population and extract the features we used for the traditional risk factor variables.