Skip to content

This is a clickbait classification project - Done for Natural Language Processing at University

Notifications You must be signed in to change notification settings

RossHolland-Melt/Clickbait-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

NLP: Click Bait Classification

The Clickbait Classification Project is designed to compare the effectiveness of several machine learning algorithms in distinguishing clickbait from non-clickbait headlines. The objective is to identify which algorithm performs best at accurately classifying headlines, thereby providing a robust solution for filtering out clickbait content. Throughout the project, various models are trained, evaluated, and compared to determine the most efficient and accurate approach to clickbait detection

Technology Stack

Technologies and libraries used:

  • Python: Main programming language.
  • TensorFlow: Machine learning and neural networks library.
  • Pandas: Data manipulation and analysis.
  • NumPy: Numerical operations.
  • Scikit-learn: Machine learning models implementation.
  • NLTK: Natural language processing tasks.
  • Matplotlib/Seaborn: Data visualization.
  • Jupyter Notebook: Interactive code development and documentation.

Dataset

The dataset utilized for this project is a comprehensive collection of news headlines from a variety of sources, both clickbait and non-clickbait, enabling the training of a balanced classification model. Key features of the dataset include:

Headlines: A compilation of headlines from multiple news outlets.

Labels: Binary classification where 1 signifies clickbait and 0 signifies non-clickbait.

Volume: A total of 32,000 entries split evenly between clickbait and non-clickbait categories.

More Information can be found here: https://www.kaggle.com/datasets/amananandrai/clickbait-dataset

Further Work

To further enhance the project, additional steps can be taken, such as:

  • Extended EDA: Conducting deeper exploratory analysis to uncover additional insights and potential feature engineering avenues.
  • Advanced Feature Engineering: Experimenting with more sophisticated NLP techniques to improve feature extraction.
  • Extended Hyperparameter Tuning: Implementing a more exhaustive search for hyperparameters to fine-tune the models for better accuracy and performance.
  • Model Ensembling: Combining the predictions of multiple models to improve the final prediction accuracy.
  • Deployment Optimization: Refining the model deployment process for scalability and ease of integration.
  • User Interface Development: Creating a user-friendly interface for non-technical users to utilize the model's capabilities.

These enhancements will aim to improve the model's performance, ease of use, and integration into existing systems, making the tool more accessible and efficient for end-users.

About

This is a clickbait classification project - Done for Natural Language Processing at University

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published