This repository is dedicated to the customization and training of VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) for text-to-speech (TTS) applications using Vietnamese language data, utilizing the TTS Coqui framework. The repository contains the necessary code and resources to train VITS specifically for generating high-quality speech from Vietnamese text.
- I highly recommend you to use conda virtual environment, with Python 3.11.5.
conda create -n vits python=3.11.5
- In this repo, I use TTS framework version 0.17.5 for statibility.
pip install TTS==0.17.5
from TTS.api import TTS
tts = TTS('vits_tts',
model_path='path to the .pth file ',
config_path='path to the config.json file')
tts.tts_to_file(text="Your example text", file_path="your_filename.wav")
My trained model is published on this HuggingFace space. Due to the resource factor to train the model, the results achieved are not as expected. The upcoming goal is to collect personal data for implementation voice clone.