Skip to content

This project uses data analytics and machine learning to assess and predict credit risk. It includes data preprocessing, exploratory analysis, feature engineering, and model building with Python, Pandas, and Scikit-learn to help financial institutions make informed decisions.

License

Notifications You must be signed in to change notification settings

rpamintuan671/Credit_Risk_Analysis

Repository files navigation

Credit Risk Analysis - Supervised Learning

Overview of the Analysis

In this project, credit risk is an unbalanced classification problem because good loans outnumber risky loans. In order our analysis would be more accurate, we are implementing different techniques to train and evaluate models with unbalanced classes. A dataset from LendingClub, a peer-to-peer lending services company will be utilize and employ the following:

Supervised Machine Learning Models Untilized:

  • Naive Random OverSampler
  • SMOTE Oversampling
  • Cluster Centoroids Undersampling
  • SMOTEENN Combination (Over and Under) Sampling
  • Balanced Random Forest Classifier
  • Easy Ensemble ADAboost Classifier

Resources Utilized to Complete Analysis:

  • Data Source: LoanStats_2019Q1.csv
  • Tools: Jupyter Notebook, MS Excel
  • Language: Python
  • Python Dependencies: pandas, pathlib, numpy, scikit-lear, imbalanced-learn

Results:

Naive Random Oversampling

Balanced Accuracy Score

Delivery 1 Balanced Accuracy

Confusion Matrix

Confusion Matrix

Classification Report

Classification Report

SMOTE Oversampling

Balanced Accuracy Score

Confusion Matrix

Imbalanced Classification Report

Undersampling

Balanced Accuracy Score

Confusion Matrix

Imbalanced Classification Report

Combination (Over and Under) Sampling

Testing over-and under-sampling algorithm. Below is a result of resampling data using SMOTEENN algorithm.

Balanced Accuracy Score

Confusion Matrix

Imbalanced Classification Report

Esemble Learners

Balanced Random Forest Classifier

Balanced Accuracy Score

Confusion Matrix

Imbalanced Classification Report

Easy Ensemble AdaBoost Classfier

Balanced Accuracy Score

Confusion Matrix

Imbalanced Classification Report

Summary:

Various machine learning models were utilized to evaluate the most effective in predicting credit risk. In this analysis, the accuracy, precision and sensitiviy were reviewed for each model. The confusion matrix correlates with the result of accuracy, precision and sensitivity.

Each models result differ from one another. The precision score for all the models is overfit therefore it should be combined with recall and accuracy score. It is recommended that the perfect model to utilize in credit risk analysis is the Easy Ensemble AdaBoost Classifier.

About

This project uses data analytics and machine learning to assess and predict credit risk. It includes data preprocessing, exploratory analysis, feature engineering, and model building with Python, Pandas, and Scikit-learn to help financial institutions make informed decisions.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published