Skip to content

This project employs a Logistic Regression model with the objective to predict the risk of Cardiovascular Disease and identify factors that increase may increase risks.

Notifications You must be signed in to change notification settings

jisilvia/Logistic_Regression_Heart_Disease

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Predicting the Risk of Cardiovascular Disease using Logistic Regression

enter image description here

In Machine Learning, Logistic Regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable. Unlike Linear Regression models which use continuous data, Logistic Regression models are able to use categorical datasets and explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Project Description

This project employs a Logistic Regression model with the objective to predict the risk of Cardiovascular Disease based on 16 variables such as age, waist citcumference, and preexisting health conditions. First, a binary classification model is created and optimized to predict whether risks are present. Next, the coefficients of all variables are extracted and ordered by importance to understand which factors most influence the development of heart disease. Lastly, the model's performance is evaluated using measures including the Accuracy score, Precision, Recall, and AUC score.

Steps

  1. Building and optimizing Logistic Regression model
  2. Extracting Features and their Influence
  3. Performance Evaluation

Requirements

Python. Python is an interpreted, high-level and general-purpose programming language.

Integrated Development Environment (IDE). Any IDE that can be used to view, edit, and run Python code, such as:

Packages

Install the following packages in Python prior to running the code.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score
from sklearn.preprocessing import StandardScaler
import seaborn as sns
import matplotlib.pyplot as plt

Launch

Download the Python File CA05-A_Logistic_Regression and open it in the IDE. Download and import the dataset cvd_data.csv.

Authors

Silvia Ji - GitHub

License

This project is licensed under the MIT License.

Acknowledgements

The project template and dataset provided by Arin Brahma at Loyola Marymount University.

About

This project employs a Logistic Regression model with the objective to predict the risk of Cardiovascular Disease and identify factors that increase may increase risks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published