Identifying Biomarkers for Cancer Diagnosis with Machine Learning

This research explores the use of machine learning (ML) models to identify the most important biomarkers for diagnosing cancer. The data used in this research is an extremely high-dimensional dataset that represents various cancer biomarkers.

Data

The dataset used in this research contains a large number of samples and features. It includes measurements of gene expressions and protein concentrations of various cancer biomarkers. The data was obtained from a publicly available database.

Methodology

This study employed various classes of ML models, including linear, non-linear models and ensembles, to identify the most important biomarkers for cancer diagnosis. The performance of these models was compared using standard evaluation metrics such as accuracy, precision, recall, and F1 score (macro & micro)

Feature selection techniques were applied across filters and wrappers types, including a novel feature selection approach. The purpose of feature selection was to identify the most relevant features that contribute to the accuracy of the models. The results of the different methods are discussed in the paper.

Installation

To run the project, follow these steps:

Clone the repository: git clone https://github.com/Adeyeha/Cancer-Biomarkers-ML.git
Install Python 3.x
Install the required dependencies:
- pandas: pip install pandas
- scikit-learn: pip install scikit-learn
- lazypredict: pip install lazypredict
- seaborn: pip install seaborn

Usage

This repository contains a series of Jupyter notebooks demonstrating the process of identifying critical biomarkers for cancer diagnosis using machine learning techniques. The notebooks cover various stages, including data preprocessing, feature selection, model training, and evaluation.

Experiment 1: Baseline Models

Description: This notebook delves into the dataset, conducts data cleaning, and visualizes key insights using Matplotlib and Seaborn. It establishes baseline models on the processed dataset, which serve as benchmarks for subsequent experiments.
Link: Notebook 1 - Baseline Models .

Experiment 2: Filter Methods

Description: This notebook emphasizes the implementation of feature selection through filter methods and evaluates these methods in comparison to the established baseline.
Link: Notebook 2 - Filter Methods .

Experiment 3: Wrapper Methods

Description: This notebook focuses on the practical application of feature selection using wrapper methods. It assesses the performance of these methods relative to the baseline.
Link: Notebook 3 - Wrapper Methods .

Experiment 4: Embedded Methods

Description: This notebook concentrates on feature selection through embedded methods and evaluates their effectiveness compared to the baseline.
Link: Notebook 4 - Embedded Methods .

Experiment 5: Sequential Feature Selection

Description: This notebook showcases the implementation of Sequential Feature Selection.
Link: Notebook 5 - Sequential Feature Selection .

Experiment 6: RFE-Stability Selection

Description: This notebook provides insights into Recursive Feature Elimination with Stability Selection.
Link: Notebook 6 - RFE-Stability Selection .

Notebooks 7 & 8: Final Analysis & Output

Description: These notebooks comprehensively compare all the aforementioned feature selection methods.
Link: Notebook 7 - Final Analysis
Link: Notebook 8 - Final Output .

License

MIT

Contributing

If you want to contribute to this project, please create a pull request with a detailed description of your changes.

Authors

Temitope Adeyeha
Bikram Sahoo

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
eps		eps
eval		eval
old		old
results		results
Sequential Feature Selection.ipynb		Sequential Feature Selection.ipynb
.gitignore		.gitignore
Baseline Models.ipynb		Baseline Models.ipynb
Embedded Methods.ipynb		Embedded Methods.ipynb
Filter Methods.ipynb		Filter Methods.ipynb
Final Analysis.ipynb		Final Analysis.ipynb
Final Output.ipynb		Final Output.ipynb
LICENSE		LICENSE
README.md		README.md
RFE-Stability Selection.ipynb		RFE-Stability Selection.ipynb
Random Subspace Ensemble with Balanced Classifier.ipynb		Random Subspace Ensemble with Balanced Classifier.ipynb
Stacked Generalization (Stacking) with Diverse Base Classifiers.ipynb		Stacked Generalization (Stacking) with Diverse Base Classifiers.ipynb
Wrapper Methods.ipynb		Wrapper Methods.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying Biomarkers for Cancer Diagnosis with Machine Learning

Data

Methodology

Installation

Usage

Experiment 1: Baseline Models

Experiment 2: Filter Methods

Experiment 3: Wrapper Methods

Experiment 4: Embedded Methods

Experiment 5: Sequential Feature Selection

Experiment 6: RFE-Stability Selection

Notebooks 7 & 8: Final Analysis & Output

License

Contributing

Authors

About

Releases

Packages

Languages

License

Adeyeha/Cancer-Biomarkers-ML

Folders and files

Latest commit

History

Repository files navigation

Identifying Biomarkers for Cancer Diagnosis with Machine Learning

Data

Methodology

Installation

Usage

Experiment 1: Baseline Models

Experiment 2: Filter Methods

Experiment 3: Wrapper Methods

Experiment 4: Embedded Methods

Experiment 5: Sequential Feature Selection

Experiment 6: RFE-Stability Selection

Notebooks 7 & 8: Final Analysis & Output

License

Contributing

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages