Skip to content

7soonyounes/Tweets_classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Tweets: Politics vs Sports Classification

Project Overview

This project aims to classify tweets into two categories: politics and sports. The classification is done using machine learning techniques, specifically logistic regression and XGBoost models. The project follows a structured workflow that includes data preprocessing, exploratory data analysis (EDA), text vectorization using TF-IDF, and model training and evaluation.

Project Structure

The project is organized into several main sections:

  1. Data Preprocessing:

    • Read the raw data from the dataset.
    • Check for and handle any null or missing values.
    • Perform data cleaning, including removing special characters and unwanted symbols.
    • Tokenize the text data into individual words.
    • Convert all words to lowercase.
    • Remove stop words to reduce noise in the data.
    • Lemmatize words to reduce inflections.
  2. Exploratory Data Analysis (EDA):

    • Check the balance between the two classes (politics and sports) in the dataset.
    • Visualize the class distribution using plots (e.g., bar charts).
    • Analyze the frequency of words within each label using word clouds or bar plots.
  3. Text Vectorization (TF-IDF):

    • Convert the preprocessed text data into numerical vectors using TF-IDF (Term Frequency-Inverse Document Frequency) vectorization.
  4. Modeling:

    • Train a logistic regression model on the TF-IDF transformed data.
    • Train an XGBoost model as an alternative classifier.
  5. Evaluation:

    • Evaluate the performance of both models using appropriate evaluation metrics (e.g., accuracy, precision, recall, F1-score).
  6. Conclusion:

    • Summarize the results and insights gained from the project.
    • Reflect on the effectiveness of the models and suggest potential improvements.

Getting Started

  1. Clone this repository to your local machine.
  2. Set up the required environment by installing the necessary libraries and dependencies [ pip install -r requirements.txt ]
  3. Run the Jupyter Notebook deep_tweets_classification.ipynb to execute the project pipeline.

Usage

Provide instructions on how to run the project and any relevant code snippets.

  1. Open the Jupyter Notebook deep_tweets_classification.ipynb.
  2. Follow the step-by-step instructions to execute each code cell.
  3. Review the EDA plots, model training process, and evaluation results.

We welcome your feedback, suggestions, and questions! Whether you have ideas for improvements or questions about the project.

Made with Love 💌

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Dockerfile 0.2%