This NLP library can help you with:
- Extraction of characters' replicas from literary texts;
- Identification of the actors owning these replicas.
We aim to achieve the following goals:
- Better accuracy on the actor classification task (it is near 80% or worse for now);
- Support for more languages (only Russian is supported at the moment).
Install with pip is just a usual pip install .
from the project dir.
As a library
You can find an example of using the library in the cli.py
file.
As a tool for the CLI
Test output on a text file:
ttc print-play path-to-the-text-file text-language
Notes
- Text must be encoded in UTF-8;
- Text must be sanitized (see #23);
- It is usually better to test on some middle-sized text (e.g a book chapter);
- Supported
text-language
s are:- ru (russian)
Please install Poetry.
Spawn a new virtual environment for the project:
poetry shell
Install project dependencies:
poetry install [--with dev,large_models_ru]
Contributions are very welcome!