This project performs data analysis and classification modeling to predict loan approval status based on applicant financial and demographic information. Using Python libraries such as Pandas, Seaborn, and Scikit-Learn, we analyze various factors impacting loan approval, preprocess the dataset, and build machine learning models to classify loans as approved or rejected.
Features: Data Cleaning and Transformation: Combines asset columns into "Movable" and "Immovable" assets and maps categorical features to numerical values.
Exploratory Data Analysis (EDA): Visualizes distributions and relationships between key features like loan term, annual income, assets, and credit score, using histograms, box plots, and scatter plots.
Correlation Analysis: Examines feature correlations and their impact on loan approval.
Machine Learning Models:
- Decision Tree Classifier
- Random Forest Classifier
Model Evaluation:
- Confusion Matrix
Classification Report
- R² Score
- Mean Squared Error (MSE)
- Mean Absolute Error (MAE)
Key Insights
- Visualizations reveal that higher credit scores and larger assets generally increase the likelihood of loan approval.
- A heatmap provides insight into feature correlations to enhance feature selection for improved model performance.
This project is ideal for understanding data-driven decision-making in the loan approval process and demonstrates practical skills in data analysis, machine learning, and model evaluation.