In Machine Learning, Logistic Regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable. Unlike Linear Regression models which use continuous data, Logistic Regression models are able to use categorical datasets and explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
This project employs a Logistic Regression model with the objective to predict the risk of Cardiovascular Disease based on 16 variables such as age, waist citcumference, and preexisting health conditions. First, a binary classification model is created and optimized to predict whether risks are present. Next, the coefficients of all variables are extracted and ordered by importance to understand which factors most influence the development of heart disease. Lastly, the model's performance is evaluated using measures including the Accuracy score, Precision, Recall, and AUC score.
- Building and optimizing Logistic Regression model
- Extracting Features and their Influence
- Performance Evaluation
Python. Python is an interpreted, high-level and general-purpose programming language.
Integrated Development Environment (IDE). Any IDE that can be used to view, edit, and run Python code, such as:
Install the following packages in Python prior to running the code.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, roc_auc_score
from sklearn.preprocessing import StandardScaler
import seaborn as sns
import matplotlib.pyplot as plt
Download the Python File CA05-A_Logistic_Regression and open it in the IDE. Download and import the dataset cvd_data.csv.
This project is licensed under the MIT License.
The project template and dataset provided by Arin Brahma at Loyola Marymount University.