Deep Tweets: Politics vs Sports Classification

Project Overview

This project aims to classify tweets into two categories: politics and sports. The classification is done using machine learning techniques, specifically logistic regression and XGBoost models. The project follows a structured workflow that includes data preprocessing, exploratory data analysis (EDA), text vectorization using TF-IDF, and model training and evaluation.

Project Structure

The project is organized into several main sections:

Data Preprocessing:
- Read the raw data from the dataset.
- Check for and handle any null or missing values.
- Perform data cleaning, including removing special characters and unwanted symbols.
- Tokenize the text data into individual words.
- Convert all words to lowercase.
- Remove stop words to reduce noise in the data.
- Lemmatize words to reduce inflections.
Exploratory Data Analysis (EDA):
- Check the balance between the two classes (politics and sports) in the dataset.
- Visualize the class distribution using plots (e.g., bar charts).
- Analyze the frequency of words within each label using word clouds or bar plots.
Text Vectorization (TF-IDF):
- Convert the preprocessed text data into numerical vectors using TF-IDF (Term Frequency-Inverse Document Frequency) vectorization.
Modeling:
- Train a logistic regression model on the TF-IDF transformed data.
- Train an XGBoost model as an alternative classifier.
Evaluation:
- Evaluate the performance of both models using appropriate evaluation metrics (e.g., accuracy, precision, recall, F1-score).
Conclusion:
- Summarize the results and insights gained from the project.
- Reflect on the effectiveness of the models and suggest potential improvements.

Getting Started

Clone this repository to your local machine.
Set up the required environment by installing the necessary libraries and dependencies [ pip install -r requirements.txt ]
Run the Jupyter Notebook deep_tweets_classification.ipynb to execute the project pipeline.

Usage

Provide instructions on how to run the project and any relevant code snippets.

Open the Jupyter Notebook deep_tweets_classification.ipynb.
Follow the step-by-step instructions to execute each code cell.
Review the EDA plots, model training process, and evaluation results.

We welcome your feedback, suggestions, and questions! Whether you have ideas for improvements or questions about the project.

Made with Love 💌

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
Dockerfile		Dockerfile
README.md		README.md
deep_tweets_classification.ipynb		deep_tweets_classification.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Tweets: Politics vs Sports Classification

Project Overview

Project Structure

Getting Started

Usage

About

Releases

Packages

Languages

7soonyounes/Tweets_classification

Folders and files

Latest commit

History

Repository files navigation

Deep Tweets: Politics vs Sports Classification

Project Overview

Project Structure

Getting Started

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages