Framingham Heart Study - Cardiovascular Disease Prediction

This repository contains a Jupyter Notebook for our final project in a machine learning course. The project aims to predict cardiovascular disease using the Framingham Heart Study dataset. We explored various machine learning algorithms and preprocessing techniques to find the most effective method for accurate predictions.

Dataset

We utilized the Framingham Heart Study dataset available on Kaggle. The dataset includes various health-related features to predict the risk of cardiovascular disease.

Project Overview

Our project focuses on comparing the performance of different machine learning models under various data splitting, imbalance handling, and hyperparameter tuning scenarios. The scenarios include:

Data Splitting Ratios: 1:9, 2:8, and 3:7 proportions for testing and training data.
Imbalance Handling Techniques:
- Imbalanced Data (original dataset)
- Undersampling
- Oversampling using SMOTE
Hyperparameter Tuning: Testing models with and without hyperparameter tuning.
Machine Learning Algorithms:
- Decision Tree
- Random Forest
- k-Nearest Neighbor (k-NN)
- Extreme Gradient Boosting (XGBoost)
- Support Vector Machines (SVM)

Results

The main findings from our experiments are as follows:

Optimal Preprocessing: The best preprocessing technique for this dataset was oversampling using SMOTE, which effectively handled class imbalance.
Best Model: The Random Forest algorithm provided the highest accuracy for cardiovascular disease prediction when using SMOTE, a 9:1 training/testing data split, and hyperparameter tuning.
Impact of Hyperparameter Tuning: Hyperparameter tuning significantly improved model performance in most cases. However, the impact varied across different scenarios, with some scenarios showing improvements, declines, or no change in performance.

Conclusion

Our study demonstrates that careful preprocessing and hyperparameter tuning are crucial for optimizing machine learning models in predicting cardiovascular disease. The Random Forest algorithm, in particular, showed superior performance under the tested conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dataset		dataset
.gitignore		.gitignore
README.md		README.md
base.ipynb		base.ipynb
based-classifier-stroke.ipynb		based-classifier-stroke.ipynb
based-classifier.ipynb		based-classifier.ipynb
tuned-classifier-stroke.ipynb		tuned-classifier-stroke.ipynb
tuned-classifier.ipynb		tuned-classifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Framingham Heart Study - Cardiovascular Disease Prediction

Dataset

Project Overview

Results

Conclusion

About

Contributors 2

Languages

javakanaya/framingham-cvd

Folders and files

Latest commit

History

Repository files navigation

Framingham Heart Study - Cardiovascular Disease Prediction

Dataset

Project Overview

Results

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages