This repository is dedicated to the task of text-detoxification, aiming to identify and neutralize toxic language in text. It is based on a logistic regression model that classifies sentences as toxic or neutral, using word features and BERT Model to suggest non-toxic substitutes.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Before running the project, ensure you have the following installed:
- Python 3.6 or above
- pip
Clone the repository to your local machine:
git clone https://github.com/MostafaKhaled2017/text-detoxification.git
cd text-detoxification
pip install -r requirements.txt
The repository consists of several components that can be run sequentially to perform data transformation, model training, and predictions.
To get the Data:
python src/data/make_dataset.py
To train the models:
python src/models/train.py
To check model predictions:
python src/models/predict.py
To build Visualizations:
python src/models/visualize.py
This approach is based on the methodology proposed in the paper Text Detoxification using Large Pre-trained Neural Models
- Name: Mostafa Kira
- Email: [email protected]
- Studies in group B21-DS02 at Innopolis University