This project is developed as part of the iNeuron.ai internship. The primary objective of this project is to build a model using Machine Learning which can accurately predict whether a website is phishing or safe.
Phishing is an attempt of fraudulently accessing sensitive or confidential information of an internet user by appearing as a trusted person or entity. It is a social engineering attack that aims at exploiting the weakness found in system processes as caused by system users.
- Website URL analysis for phishing indicators.
- Machine learning model integration for accurate phishing detection.
- Web interface for easy interaction (Entering features manually gives an accuracy of 96.66%)
https://data.mendeley.com/datasets/72ptz43s9v/1
MySQL
DVC : Data Versioning Control
dvc init
dvc add
dvc pull
dvc push
For Feature Selection : Yellowbrick by Scikit Learn
For building Random Forest Classifier model
For end-to-end project
This ML feature is deployed by creating a Flask frontend and as static web app using Azure.
The model achieved after hyperparameter tuning had following parameters:
RandomForestClassifier(max_depth=20, min_samples_leaf=2, min_samples_split=13, n_estimators=105)
Classification Report using Yellowbrick
conda create -p venv python==3.11.4 -y
Intalling Dependencies
pip install -r requirements.txt
git add .
git commit -m "Initial commit"
git branch -M main
git remote add origin <github_url>
git push -u origin main
git commit -m "proper message"
git push -u origin main