Solution profanity-predictor is designed for the task of real-time profanity prediction based on the multimodal (audio and textual channels of the speech) analysis.
The proposed pipeline allows for working with a sound stream in a standby fashion. It transforms the signal to MFCC to deal with the audio channel's information and process ASR to extend the set of features with previous word labels. The prediction model is the LSTM with attention layers.
Clone repository:
git clone https://github.com/expertspec/profanity-predictor.git
Install all dependencies from requirements.txt
file:
pip install -r requirements.txt
/profanity-predictor
├── assets # Images for readme
├── data
│ ├── banned_words.txt
│ └── test_records
├───src # Executive files
│ ├───features # Scripts for features extraction
│ │ └───tools
│ ├───models # Models's architecture and tools for usage
│ └───preprocessing # Scripts for dataset preporation
└───weights # Folder for model's weights
It is possible to download test records for quick start.
Default weights for prediction model can be download here
Run inference for prediction on the samples from test records
$ python3 data_inference.py ./data/test_records --device cpu
It is also possible to specify arguments "--path_to_banned_words" and "--weights"
Run inference for working with speech stream
$ python3 stream_inference.py
The dataset is available here
Multimodal prediction of profanity based on speech analysis
- [x] Initial inference for test data
- [x] Real-time implementation
- [ ] Examples
- [ ] Tests
Funding research project No. 622279 "Development of a service for assessing the validity of expert opinion based on dynamic intelligent analysis of video content".
@software{expertspec,
title = {profanity-predictor},
author = {Smirnov, Ivan},
year = {2023},
url = {https://github.com/expertspec/profanity-predictor},
version = {0.0.1}
}