Vietnamese Speech Synthesis with VITS text to speech model and TTS Coqui framework

This repository is dedicated to the customization and training of VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) for text-to-speech (TTS) applications using Vietnamese language data, utilizing the TTS Coqui framework. The repository contains the necessary code and resources to train VITS specifically for generating high-quality speech from Vietnamese text.

Pre-requisites

I highly recommend you to use conda virtual environment, with Python 3.11.5.

conda create -n vits python=3.11.5

In this repo, I use TTS framework version 0.17.5 for statibility.

pip install TTS==0.17.5

Inference

from TTS.api import TTS

tts = TTS('vits_tts',
          model_path='path to the .pth file ',
          config_path='path to the config.json file')

tts.tts_to_file(text="Your example text", file_path="your_filename.wav")

Demo

My trained model is published on this HuggingFace space. Due to the resource factor to train the model, the results achieved are not as expected. The upcoming goal is to collect personal data for implementation voice clone.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
infer.py		infer.py
train.py		train.py
train_vits.code-workspace		train_vits.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vietnamese Speech Synthesis with VITS text to speech model and TTS Coqui framework

Pre-requisites

Inference

Demo

About

Releases

Packages

Languages

vannam26102000/train_ttsvit_vi

Folders and files

Latest commit

History

Repository files navigation

Vietnamese Speech Synthesis with VITS text to speech model and TTS Coqui framework

Pre-requisites

Inference

Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages