Skip to content

Latest commit

 

History

History
66 lines (51 loc) · 4.24 KB

README.md

File metadata and controls

66 lines (51 loc) · 4.24 KB

Credit_Risk_Analysis

Overview of the analysis:

This exercise is to employ different techniques to train and evaluate different machine learning models to predict credit risk with unbalanced classes. Algorithms used in the analysis:

  • the oversampling RandomOverSampler and SMOTE algorithms.
  • the undersampling ClusterCentroids algorithm to resample the data.
  • the combinatorial SMOTEENN algorithm to resample the training data.
  • BalancedRandomForestClassifier and EasyEnsembleClassifier to reduce bias.

Results:

We use balanced accuracy score, confusion matrix and imbalanced classification report to compare results.

RandomOverSampler

< Sublime's custom image

The balanced accuracy score is 62%.
The high_risk precision is about 1% only with 60% sensitivity which makes a F1 of 2% only.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 65%.

SMOTE

The balanced accuracy score is 65%.
The high_risk precision is about 1% only with 64% sensitivity which makes a F1 of 2% only.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 66%.
Very similiar result to the previous one.

ClusterCentroids

The balanced accuracy score is down to 52%.
The high_risk precision is about 1% only with 59% sensitivity which makes a F1 of 1% only.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 46%.

Combinatorial SMOTEENN

The balanced accuracy score is 62%.
The high_risk precision is about 1% only with 69% sensitivity which makes a F1 of 2%.
Due to the imbalanced number of the low_risk population, its precision is almost 100% with a sensitivity of 54%.

BalancedRandomForestClassifier

The balanced accuracy score is greatly improved to 79%.
The high_risk precision is about 4% only with 67% sensitivity which makes a F1 of 7%.
Due to a lower number of false positives, its precision is almost 100% with a sensitivity of 91%.

EasyEnsembleClassifier

The balanced accuracy score is very high at 93%.
The high_risk precision is about 7% only with 91% sensitivity which makes a F1 of 14%.
Due to a lower number of false positives, its precision is almost 100% with a sensitivity of 94%.

Summary:

  • All the models we used to predict the credit risk analysis show weak precision in determining if a credit risk is high.
  • The Ensemble models show great improvment specially on the sensitivity of the high risk credits.
  • Even though the EasyEnsembleClassifier model detects almost all high risk credit. On another hand, with a low precision, a lot of low risk credits are still falsely detected as high risk. It may lead to the bank losing its business opportunities.
  • Maybe there are models the bank can use to predict credit risk other than those above.