Customer churn is a critical concern for banks as retaining existing customers is often more cost-effective than acquiring new ones. Predicting customer churn can help banks identify at-risk customers and take proactive measures to retain them. This project leverages machine learning techniques to predict customer churn based on various attributes provided in the dataset. By accurately identifying customers likely to churn, banks can implement targeted retention strategies, ultimately improving customer satisfaction and reducing revenue loss.
- Customer ID: Unique identifier for each customer.
- Surname: Customer's surname or last name.
- Credit Score: Numerical value representing the customer's credit score.
- Geography: Country where the customer resides (France, Spain, or Germany).
- Gender: Customer's gender (Male or Female).
- Age: Customer's age.
- Tenure: Number of years the customer has been with the bank.
- Balance: Customer's account balance.
- NumOfProducts: Number of bank products the customer uses (e.g., savings account, credit card).
- HasCrCard: Whether the customer has a credit card (1 = yes, 0 = no).
- IsActiveMember: Whether the customer is an active member (1 = yes, 0 = no).
- EstimatedSalary: Estimated salary of the customer.
The solution provided in this repository involves an end-to-end pipeline for predicting customer churn using machine learning. Here's an overview of the solution:
-
Data Preprocessing:
- Handle missing values.
- Encode categorical variables.
- Scale numerical features.
-
Feature Engineering:
- Extract relevant features.
- Perform feature scaling if necessary.
-
Model Selection:
- Choose appropriate machine learning algorithms for classification (e.g., Random Forest, Gradient Boosting, Logistic Regression) or just use combined them all.
-
Model Training:
- Train the selected models using the preprocessed data.
-
Model Evaluation:
- Evaluate model performance using suitable metrics such as accuracy, precision, recall, ROC_AUC Score and F1-score.
- Perform cross-validation to ensure generalization.
-
Hyperparameter Tuning :
- Tune hyperparameters of selected models to optimize performance by using OPTUNA
-
Deployment:
- Deploy the trained model using Streamlit for creating a user-friendly web application.
- Users can input customer information and get predictions on whether the customer is likely to churn.
Clone the repository
git clone https://github.com/luficerg/churn.ai
conda create -n venv python=3.10 -y
conda activate venv
pip install -r requirements.txt
streamlit run .\app.py
Now you can enjoy it , but if you don't have it installed streamlit and want to know more in details, you can check out src and notebook_research folder
export MLFLOW_TRACKING_URI=https://dagshub.com/luficerg/Kaggle-Competitions.mlflow \
export MLFLOW_TRACKING_USERNAME=luficerg \
export MLFLOW_TRACKING_PASSWORD= 251c01a63af78636ff098c62735d662f759756ce \
#with specific access
1. EC2 access : It is virtual machine
2. ECR: Elastic Container registry to save your docker image in aws
#Description: About the deployment
1. Build docker image of the source code
2. Push your docker image to ECR
3. Launch Your EC2
4. Pull Your image from ECR in EC2
5. Lauch your docker image in EC2
#Policy:
1. AmazonEC2ContainerRegistryFullAccess
2. AmazonEC2FullAccess
- Save the URI: 727204150125.dkr.ecr.ap-south-1.amazonaws.com/churn.ai
#optinal
sudo apt-get update -y
sudo apt-get upgrade
#required
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker
- GO to GitHUb REpo setting>
- Actions>
- Runner>
- New self hosted runner>
- Choose os>
- Then run command one by one into EC2 machine
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION = ap-south-1
AWS_ECR_LOGIN_URI =
ECR_REPOSITORY_NAME =