Skip to content

xcmyz/FastSpeech

Folders and files

NameName
Last commit message
Last commit date
Sep 14, 2020
Aug 27, 2020
Jul 20, 2020
Jul 21, 2020
Jul 20, 2020
Jul 20, 2020
Jul 20, 2020
Jul 20, 2020
Jun 7, 2021
Jul 24, 2020
Jul 20, 2020
Jul 20, 2020
Aug 9, 2020
Jul 20, 2020
Jul 20, 2020
Jul 20, 2020
Jul 20, 2020
Aug 9, 2020
Jul 20, 2020
Jul 20, 2020
Jun 22, 2022
Jul 20, 2020
Jul 20, 2020

Repository files navigation

FastSpeech-Pytorch

The Implementation of FastSpeech Based on Pytorch.

Update (2020/07/20)

  1. Optimize the training process.
  2. Optimize the implementation of length regulator.
  3. Use the same hyper parameter as FastSpeech2.
  4. The measures of the 1, 2 and 3 make the training process 3 times faster than before.
  5. Better speech quality.

Model

My Blog

Prepare Dataset

  1. Download and extract LJSpeech dataset.
  2. Put LJSpeech dataset in data.
  3. Unzip alignments.zip.
  4. Put Nvidia pretrained waveglow model in the waveglow/pretrained_model and rename as waveglow_256channels.pt;
  5. Run python3 preprocess.py.

Training

Run python3 train.py.

Evaluation

Run python3 eval.py.

Notes

  • In the paper of FastSpeech, authors use pre-trained Transformer-TTS model to provide the target of alignment. I didn't have a well-trained Transformer-TTS model so I use Tacotron2 instead.
  • I use the same hyper-parameter as FastSpeech2.
  • The examples of audio are in sample.
  • pretrained model.

Reference

Repository

Paper