DIABETES PREDICTION MODEL
Dataset to be used : https://www.kaggle.com/uciml/pima-indians-diabetes-database
1. Exploratory Data Analysis and Data Visualization
- General View
- Categorical Variables Analysis
- Numerical Variables Analysis
- Target Analysis
2. Data Preprocessing and Feature Engineering
- General View - Recap - Remember Dataset
- Outlier Analysis
- Missing Values Analysis
- Feature Creation
- Label and/or One Hot Encoding
- Standardization
- Save the Final Dataset --> Pickle Dataset
3. Modeling
- Logistic Regression
- Naive Bayes Classifier
- K-Nearest Neighbors Classifier
- Support Vector Machines
- Artificial Neural Network Models
- DecisionTreeClassifier
- BaggingClassifier
- RandomForestClassifier
- AdaBoostClassifier
- Gradient Boosting Classifier
- XGBoost - XGBClassifier
- LightGBM - LGBMClassifier
- CatBoost - CatBoostClassifier
- NGBoost - NGBClassifier
4. Pickle the Models, Saving the Model for later Use
5. Comparison of Metrics of each Model
--> For each model steps to follow;
- Model and Prediction
- Evaluation of Model
- Model Tuning
- Model Visualization (Feature Importances, ROC/AUC Curve, Confusion Matrix, etc.)
- Saving the Model