Credit Risk Analysis - Supervised Learning

Overview of the Analysis

In this project, credit risk is an unbalanced classification problem because good loans outnumber risky loans. In order our analysis would be more accurate, we are implementing different techniques to train and evaluate models with unbalanced classes. A dataset from LendingClub, a peer-to-peer lending services company will be utilize and employ the following:

Supervised Machine Learning Models Untilized:

Naive Random OverSampler

SMOTE Oversampling

Cluster Centoroids Undersampling

SMOTEENN Combination (Over and Under) Sampling

Balanced Random Forest Classifier

Easy Ensemble ADAboost Classifier

Resources Utilized to Complete Analysis:

Data Source: LoanStats_2019Q1.csv

Tools: Jupyter Notebook, MS Excel

Language: Python

Python Dependencies: pandas, pathlib, numpy, scikit-lear, imbalanced-learn

Results:

Naive Random Oversampling

Balanced Accuracy Score

Confusion Matrix

Classification Report

SMOTE Oversampling

Balanced Accuracy Score

Confusion Matrix

Imbalanced Classification Report

Undersampling

Balanced Accuracy Score

Confusion Matrix

Imbalanced Classification Report

Combination (Over and Under) Sampling

Testing over-and under-sampling algorithm. Below is a result of resampling data using SMOTEENN algorithm.

Balanced Accuracy Score

Confusion Matrix

Imbalanced Classification Report

Esemble Learners

Balanced Random Forest Classifier

Balanced Accuracy Score

Confusion Matrix

Imbalanced Classification Report

Easy Ensemble AdaBoost Classfier

Balanced Accuracy Score

Confusion Matrix

Imbalanced Classification Report

Summary:

Various machine learning models were utilized to evaluate the most effective in predicting credit risk. In this analysis, the accuracy, precision and sensitiviy were reviewed for each model. The confusion matrix correlates with the result of accuracy, precision and sensitivity.

Each models result differ from one another. The precision score for all the models is overfit therefore it should be combined with recall and accuracy score. It is recommended that the perfect model to utilize in credit risk analysis is the Easy Ensemble AdaBoost Classifier.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Resources		Resources
LICENSE		LICENSE
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Analysis - Supervised Learning

Overview of the Analysis

Supervised Machine Learning Models Untilized:

Resources Utilized to Complete Analysis: