My first project!!!! Yay!
This repository contains a basic machine learning project to predict heart disease using health metrics. The code includes data preprocessing, exploratory data analysis, model building, evaluation, and saving the trained model.
- Data Loading: Load the dataset from a CSV file.
- Data Preprocessing: Handle missing values and create new categorical features.
- Exploratory Data Analysis (EDA): Visualize data distribution and relationships.
- Data Balancing: Use SMOTE to balance the dataset.
- Model Building: Create a machine learning pipeline and train a logistic regression model.
- Model Evaluation: Evaluate the model using various metrics and cross-validation.
- Model Saving: Save the trained model for future use.
Install the required Python packages:
pip install pandas numpy matplotlib seaborn scikit-learn imbalanced-learn joblib
-
Clone the Repository:
git clone https://github.com/yourusername/heart-disease-prediction.git cd heart-disease-prediction
-
Prepare the Data:
Place
heart.csv
in the repository directory. -
Run the Script:
Execute the main script:
python main.py[README.md](https://github.com/user-attachments/files/15899684/README.md)
def load_data(file_path):
return pd.read_csv(file_path)
def preprocess_data(df):
df = df.dropna()
age_bins = [0, 20, 40, 60, 100]
age_labels = ['Youth', 'Young Adult', 'Middle-aged adult', 'Old']
df['Age_Cat'] = pd.cut(df['Age'], bins=age_bins, labels=age_labels, right=False)
return df
def plot_data(df):
# Various plots for EDA
def oversample_data(X, y):
os = SMOTE(random_state=0)
X_os, y_os = os.fit_resample(X, y)
return X_os, y_os
def build_pipeline():
# Pipeline construction
def evaluate_model(model, X_test, y_test):
# Evaluation metrics and plots
def cross_validate_model(pipeline, X, y):
# Cross-validation results
def save_model(pipeline, model_path):
joblib.dump(pipeline, model_path)
print(f"Model saved to {model_path}")