Skip to content

Ysjin33/Heart-Disease-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heart Disease Prediction Model

My first project!!!! Yay!

This repository contains a basic machine learning project to predict heart disease using health metrics. The code includes data preprocessing, exploratory data analysis, model building, evaluation, and saving the trained model.

Project Steps

  1. Data Loading: Load the dataset from a CSV file.
  2. Data Preprocessing: Handle missing values and create new categorical features.
  3. Exploratory Data Analysis (EDA): Visualize data distribution and relationships.
  4. Data Balancing: Use SMOTE to balance the dataset.
  5. Model Building: Create a machine learning pipeline and train a logistic regression model.
  6. Model Evaluation: Evaluate the model using various metrics and cross-validation.
  7. Model Saving: Save the trained model for future use.

Requirements

Install the required Python packages:

pip install pandas numpy matplotlib seaborn scikit-learn imbalanced-learn joblib

Usage

  1. Clone the Repository:

    git clone https://github.com/yourusername/heart-disease-prediction.git
    cd heart-disease-prediction
  2. Prepare the Data:

    Place heart.csv in the repository directory.

  3. Run the Script:

    Execute the main script:

    python main.py[README.md](https://github.com/user-attachments/files/15899684/README.md)
    

Main Functions

Data Loading

def load_data(file_path):
    return pd.read_csv(file_path)

Data Preprocessing

def preprocess_data(df):
    df = df.dropna()
    age_bins = [0, 20, 40, 60, 100]
    age_labels = ['Youth', 'Young Adult', 'Middle-aged adult', 'Old']
    df['Age_Cat'] = pd.cut(df['Age'], bins=age_bins, labels=age_labels, right=False)
    return df

Data Visualization

def plot_data(df):
    # Various plots for EDA

Data Balancing

def oversample_data(X, y):
    os = SMOTE(random_state=0)
    X_os, y_os = os.fit_resample(X, y)
    return X_os, y_os

Model Pipeline

def build_pipeline():
    # Pipeline construction

Model Evaluation

def evaluate_model(model, X_test, y_test):
    # Evaluation metrics and plots

Cross-Validation

def cross_validate_model(pipeline, X, y):
    # Cross-validation results

Model Saving

def save_model(pipeline, model_path):
    joblib.dump(pipeline, model_path)
    print(f"Model saved to {model_path}")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages