The repository contains the source code of the python script, as well as notebooks for using parts extracted from the script via Colaboratory (.ipynb).
This Python script is designed to facilitate language translation, leveraging advanced machine learning models to accurately translate text from one language to another. It utilizes the transformers
library to access pre-trained models for high-quality translations. The script supports translating text files or input from the standard input, making it versatile for various use cases.
- Language Detection: Automatically detects the language of the input text using both
langdetect
and a fastText model, ensuring accurate translation. - Translation: Translates text to a specified target language using state-of-the-art models from Helsinki-NLP or Facebook's M2M-100 and NLLB-200 distilled model, depending on availability and compatibility.
- Device Compatibility: Automatically detects and utilizes available computational resources, preferring GPU acceleration when available for faster processing.
- Customizable Output: Allows users to specify an output file for the translated text or prints the translation to the terminal if no file is specified.
- Sentence Splitting: Splits input text into manageable sentences or segments to ensure the translation's quality and coherence, especially for longer texts.
- Python 3.x
transformers
torch
requests
nltk
pycountry
langdetect
sentencepiece
fasttext
tiktoken
sacremoses
(optional)
- Clone the repository to your local machine.
git clone https://github.com/r1cc4rd0m4zz4/traNsLatorLaB.git
- Install the required Python packages using
pip
:
pip install -r requirements.txt
- Download and prepare any necessary models or data files as described in the script comments or documentation.
- Run the script from the command line, optionally specifying the input text file and target language, by default the script will translate to Italian, example:
pbpaste | python translatorlab.py [-o OUTPUT] [-l {it,en}] [-m {opus,m2m,m2m-418,m2m-1.2,nllb,nllb-d600,nllb-1.3,nllb-d1.3,nllb-3.3}] [-s] [txt_path]
- For direct text input or to use the script in an interactive mode, follow the instructions provided in the script's comments or use the -h flag to access help:
python translatorlab.py -h
Use of the TraNsLatorLaB machine translation template is at your own risk. The author of the code assumes no liability for any damage or loss resulting from the use of the template.
In addition, use of the template may be subject to local or international laws and regulations. It is the user's responsibility to verify that the use of the template complies with applicable laws and regulations.
Finally, the author of the code does not guarantee the security of the template or its compliance with privacy or data security regulations. It is your responsibility to ensure the security and privacy of your data and to use the template in compliance with applicable regulations.
- Helsinki-NLP for the translation models
- Facebook AI for the M2M-100 and NLLB-200 models
- The transformers library by Hugging Face
- The PyTorch team for providing an open-source machine learning library for Python
- The requests library for providing a simple interface for making HTTP requests
- The Natural Language Toolkit (
nltk
) team for providing essential NLP tools and resources - The pycountry library for providing ISO country code utilities and data
- The langdetect library for providing language detection capabilities
- The sentencepiece library for providing efficient subword tokenization
- The fasttext library for providing fast and accurate language identification
- The sacremoses library for providing tokenization and detokenization utilities
- The tiktoken library for providing efficient tokenization tools