Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #1783 solved
Created a Fake News Detection system using machine learning algorithms like Logistic Regression, Decision Tree, Gradient Boosting, and Random Forest. The dataset consists of real and fake news articles, and we have applied preprocessing, feature extraction using TF-IDF, and trained several models
Description:
Summary of the fake news detection project:
Libraries: Various Python libraries are used, including pandas for data manipulation, matplotlib and seaborn for visualization, re and string for text preprocessing, and scikit-learn for machine learning.
Data Loading: The fake and true news datasets are loaded from CSV files (Fake.csv and True.csv). The datasets are combined, with a "class" column added (0 for fake news, 1 for true news).
Data Preprocessing:
Unnecessary columns (title, subject, date) are dropped.
Text data is cleaned by converting it to lowercase, removing special characters, URLs, and numbers.
The dataset is shuffled and split into training and testing sets.
Feature Extraction: The text is converted into numerical representations using the TfidfVectorizer to transform the text into term frequency-inverse document frequency (TF-IDF) vectors.
Machine Learning Models:
Logistic Regression: Achieved an accuracy of 98.78%.
Decision Tree Classifier: Achieved an accuracy of 99.53%.
Gradient Boosting Classifier: Achieved an accuracy of 99.44%.
Random Forest Classifier: Achieved an accuracy of 98.69%.
Model Evaluation: Precision, recall, f1-score, and accuracy are calculated for each model using the test dataset. All models perform well, with accuracy close to 99%.
The project demonstrates a successful implementation of multiple machine learning models for detecting fake news based on text data, with Decision Tree and Gradient Boosting performing the best.
Fixes # (issue)
Type of change
Checklist: