Skip to content

This repository contains code for PCA (Principal Component Analysis with fixed number of components) that reduces the number of dimensions, PCA with Scree plot (finding number of optimal components that explains maximum variance) in Fraud data set from Kaggle competition that contains more than 500 variables.

Notifications You must be signed in to change notification settings

krishcy25/DimensionalityReduction-PrincipalComponentAnalysis-Using-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

DimensionalityReduction-PrincipalComponentAnalysis-Using-Python

This repository includes the code I have worked for reducing the dimensions in the dataset. Reducing the dimensionality is very important in the world of Machine Learning if we have thousands of variables

Kaggle Competition

I have worked on PCA (for reducing the dimensions of the dataset) while submitting the predictions for Fraud Modeling (Kaggle competition) as the data contains more than 500 variables in total. I have used train set (train_identity and train_transaction) in the competition with many variables into reduced number of dimensions to build the ML algorithms for submission. Use of dimension reduction increases the accuracy rate by 4%.

The competition link can be found below: https://www.kaggle.com/c/ieee-fraud-detection

pca

When should I use PCA?

If your answer is yes to the below questions- Consider to use PCA

Do you want to reduce the number of variables, but aren’t able to identify variables to completely remove from consideration? Do you want to ensure your variables are independent of one another? Are you comfortable making your independent variables less interpretable?

This repository contains notebook "PCA-Dimensionality Reduction.ipynb" with 2 steps below:

PCA Steps in the Notebook code:

PCA Algorithm: Part 1- We are forcing the algorithm to use only 4 components

PCA Algorithm: Part 2- Building PCA to get optimal number of components that explains maximum variance

About

This repository contains code for PCA (Principal Component Analysis with fixed number of components) that reduces the number of dimensions, PCA with Scree plot (finding number of optimal components that explains maximum variance) in Fraud data set from Kaggle competition that contains more than 500 variables.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published