Credit Risk Analysis

Overview

Having worked at a financial insitution and writing consolidation loans for individuals who could not pay their loans, financial risk, including credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore different techniques are needed to train and evaluate models with unbalanced classes. In this analysis, credit card data will be oversampled using the RandomOverSampler and SMOTE algorithms, and undersampled using the ClusterCentroids algorithm. Then, a combinatorial approach of over- and undersampling using the SMOTEENN algorithm will be conducted. Next, the machine learning models BalancedRandomForestClassifier and EasyEnsembleClassifier will be used to predict credit risk. Finally, there will be an evaluation the performance of these models and a written recommendation on whether they should be used to predict credit risk.

Results

Naive Random Oversampling Results: The balanced accuracy test is 65.72%, the precision score for high risk is very low at 1%. The recall is 62%.

SMOTE Oversampling Results: The balanced accuracy test is 64.78%, the precision score for high risk is very low at 1%. The recall is 68%.

Undersampling Results: The balanced accuracy test is 54.43%, the precision score is 99%. The recall is 40%.

Combination (Undersampling and Oversampling) Results: The balanced accuracy test is 64.47%, the precision score is 99%. The recall is 57%.

Balanced Random Forest Classifier Results: The balanced accuracy test is 77.38%, the precision score is 99%. The recall is 87%.

Easy Ensemble AdaBoost Classifier Results: The balanced accuracy test is 93.17%, the precision score is 99%. The recall is 94%.

Summary

The first four models dealt with undersampling, oversampling, and a combination of both under and oversampling. These models were used to analyze credit card data and determine which model is the most effective at predicting the highest risk loans. The ensemble classifier is used to analyze and predict which loans are high risk or low risk. The first four models have accuracy scores that are not as high as the ensemble classifiers. Their recall percentages are low as well. Essemble classifiers have the best balance of precision and recall, which is preferable in a model. Therefore, I recomment the Easy Ensemble Classifier model.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Risk Analysis

Overview

Results

Summary

About

Releases

Packages

Languages

JennyJohnson78/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit Risk Analysis

Overview

Results

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages