Speech Translate is a practical application that combines OpenAI's Whisper ASR model with free translation APIs. It serves as a versatile tool for both real-time / live speech-to-text and speech translation, allowing the user to seamlessly convert spoken language into written text. Additionally, it has the option to import and transcribe audio / video files effortlessly.
Speech Translate aims to expand whisper ability by combining it with some translation APIs while also providing a simple and easy to use interface to create a more practical application. This application is also open source, so you can contribute to this project if you want to.
Preview - Usage
Transcribe mode on detached window (English)
Translate mode on detached window (English to Indonesia)
- π Features
- π Requirements
- π§ Installation
- π More Information
- π οΈ Development
- π‘ Contributing
- License
- Attribution
- Other
- Speech to text and/or Speech translation (transcribed text can be translated to other languages) with live input from mic or speaker ποΈ
- Customizable subtitle window for live speech to text and/or speech translation
- Batch file processing of audio / video files for transcription and translation with output of (.txt .srt .ass .tsv .vtt .json) π
- Result refinement
- Result alignment
- Result translation (Translate only the result.json)
- Compatible OS Installation:
OS | Installation from Prebuilt binary | Installation as a Module | Installation from Git |
---|---|---|---|
Windows | βοΈ | βοΈ | βοΈ |
MacOS | β | βοΈ | βοΈ |
Linux | β | βοΈ | βοΈ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above (Alternatively, you can make a loopback to capture your system audio as virtual input (like mic input) by using this guide/tool: [Voicemeeter on Windows]/[YT Tutorial] - [pavucontrol on Ubuntu with PulseAudio] - [blackhole on MacOS])
- Internet connection is needed only for translation with API & downloading models (If you want to go fully offline, you can setup LibreTranslate on your local machine and set it up in the app settings)
- Recommended to have
Segoe UI
font installed on your system for best UI experience (For OS other than windows, you can see this: Ubuntu - MacOS) - Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) for faster result. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | Required VRAM | Relative speed |
---|---|---|---|
tiny | 39 M | ~1 GB | ~32x |
base | 74 M | ~1 GB | ~16x |
small | 244 M | ~2 GB | ~6x |
medium | 769 M | ~5 GB | ~2x |
large | 1550 M | ~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the model speed will be significantly faster and have smaller vram usage, for more information about this please visit faster-whisper repository
Important
Please take a look at the Requirements first before installing. For more information about the usage of the app, please check the wiki
Note
The prebuilt binary is shipped with CUDA 11.8, so it will only work with GPU that has CUDA 11.8 compatibility. If your GPU is not compatible, you can try installation as module or from git below.
- Download the latest release (There are 2 versions, CPU and GPU/CUDA)
- Install/extract the downloaded file
- Run the program
- Set the settings to your liking
- Enjoy!
Note
Use python 3.11 for best compatibility and performance
Warning
You might need to have Build tools for Visual Studio (or the equivalent of it on your OS) installed
To install as module, we can use pip, with the following command.
-
Install with GPU (Cuda compatible) support:
pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --extra-index-url https://download.pytorch.org/whl/cu118
cu118 here means CUDA 11.8, you can change it to other version if you need to. You can check older version of pytorch here or here.
-
CPU only:
pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git
You can then run the program by typing speech-translate
in your terminal/console. Alternatively, when installing as a module, you can also clone the repo and install it locally by running pip install -e .
in the project directory. (Don't forget to add --extra-index-url
if you want to install with GPU support)
Notes For Installation as Module:
- If you are updating from an older version, you need to add
--upgrade --force-reinstall
at the end of the command, if the update does not need new dependencies you can add--no-deps
at the end of the command to speed up the installation process. - If you want to install from a specific branch or commit, you can do it by adding
@branch_name
or@commit_hash
at the end of the url. Example:pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git@dev --extra-index-url https://download.pytorch.org/whl/cu118
- The --extra-index-url here is for the version of CUDA. If your device is not compatible or you need to use other version of CUDA you can check older version of pytorch here or here.
If you prefer cloning the app directly from git/github, you can follow the guide in development (wiki) or below. Doing it this way might also provide a more stable environment.
Check out the wiki for more information about the app, user settings, how to use it, and more.
Note
Check the wiki for more details
Note
It is recommended to create a virtual environment, but it is not required. I also use python 3.11.6 for development, but it should work with python 3.8 or later
Warning
You might need to have Build tools for Visual Studio installed
- Clone the repo with its submodules by running
git clone --recurse-submodules https://github.com/Dadangdut33/Speech-Translate.git
cd
into the project directory- Create a virtual environment by running
python -m venv venv
- Activate your virtual environment
- Install all the dependencies needed by running
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118
if you are using GPU orpip install -r requirements.txt
if you are using CPU. - Run
python Run.py
in root directory to run the app.
Notes:
- If you forgot the
--recure-submodules
flag when cloning the repository and the submodules is not cloned correctly, you can dogit submodule update --init --recursive
in the project directory to pull the needed submodules. - The
--extra-index-url
is needed to install CUDA version of pytorch and for this one we are using CUDA 11.8. If your device is not compatible or you need to use other version of CUDA you can check the previous pytorch version in this link or this.
You can run the app by running the Run.py
located in root directory. Alternatively you can also run it using python -m speech_translate
in the root directory.
Before compiling the project, make sure you have installed all the dependencies and setup your pytorch correctly. Your pytorch version will control wether the app will use GPU or CPU (that's why it's recommended to make virtual environment for the project).
The pre compiled version in this project is built using cx_freeze, we have provided the script in build.py. This build script is only configured for windows build at the moment, but feel free to contribute if you know how to build properly for other OS.
To compile it into an exe run python build.py build_exe
in the root directory. This will produce a folder containing the compiled project alongside an executable in the build
directory. After that, use innosetup script to create an installer. You can use the provided installer.iss to create the installer.
This project should be compatible with Windows (preferrably windows 10 or later) and other platforms. But I haven't tested it extensively on other platforms. If you find any bugs or issues, feel free to create an issue.
Feel free to contribute to this project by forking the repository, making your changes, and submitting a pull request. You can also contribute by creating an issue if you find a bug or have a feature request. Also, feel free to give this project a star if you like it.
This project is licensed under the MIT License - see the LICENSE file for details
- Sunvalley TTK Theme (used for app theme although i modified it a bit)
- Noto Emoji for the icons used in the app
Check out my other similar project called Screen Translate a screen translator / OCR tools made possible using tesseract.