GitHub - shresthasingh1501/Spam-Mail-Detection: spam mail detection using multinomial naive bayes and nltk

This project aims to detect spam emails using natural language processing (NLP) techniques and machine learning. The dataset contains emails labeled as spam or non-spam, and the goal is to build a model that can classify emails accurately.

Dataset

The dataset used in this project is a CSV file containing email data with the following columns:

label_num: 1 for spam, 0 for non-spam
text: The content of the email

Preprocessing

Column Renaming: The columns are renamed for better understanding.
Handling Missing and Duplicate Values: Missing values are checked, and duplicates are removed.
Text Preprocessing: The email content is preprocessed by converting to lowercase, removing non-alphanumeric characters, removing stop words, and stemming the words.

Feature Engineering

New features are added for better analysis:

num_characters: Number of characters in the email
num_words: Number of words in the email
num_sentences: Number of sentences in the email

Model Training

Text Vectorization: The text is transformed using TfidfVectorizer.
Train-Test Split: The dataset is split into training and testing sets.
Model Selection: A Multinomial Naive Bayes model is trained on the training set.

Evaluation

The model is evaluated using accuracy, confusion matrix, and classification report. A confusion matrix is also plotted for better visualization.

Setup

Clone the repository:

git clone https://github.com/yourusername/spam-mail-detection.git
cd spam-mail-detection

Install the required packages:
```
pip install -r requirements.txt
```

Download NLTK data:

import nltk
nltk.download('stopwords')
nltk.download('punkt')

Usage

Run the preprocessing and model training script:
```
python train_model.py
```

Results

The results of the model evaluation are printed in the console, including accuracy, confusion matrix, and classification report. The confusion matrix is also visualized using a heatmap.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
spam_ham_dataset.csv		spam_ham_dataset.csv
spam_mail_detection.ipynb		spam_mail_detection.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset

Preprocessing

Feature Engineering

Model Training

Evaluation

Setup

Usage

Results

Contributing

About

Releases

Packages

Languages

shresthasingh1501/Spam-Mail-Detection

Folders and files

Latest commit

History

Repository files navigation

Dataset

Preprocessing

Feature Engineering

Model Training

Evaluation

Setup

Usage

Results

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages