synthesized wavs of long texts #7

Liujingxiu23 · 2021-07-19T01:52:16Z

I downloaded the pretrained model of databaker and synthesized wavs using inference.py.
The results are not very good, I mean the alignment is not right especially when the input text is long.
For example, "失恋的人特别喜欢往人烟罕至的角落里钻。", the synthesized wavs sounds like:
失恋的人特别喜欢往人烟罕至的_角角落里钻钻钻钻_

For longer input text，the synthesized wavs are totally wrong

light1726 · 2021-07-19T10:35:42Z

Hi @Liujingxiu23, thanks for your feedback. Attention errors can happen for vaenar-tts since there's no restriction posed to attention alignment to make it monotonic, most of them are repetitions of phonemes. From my observation, such cases are rare. It never occurs to me that the synthesized waveform is totally wrong for a sentence.

Synthesis of long sentences is more challenging as there are not many long sentences in the training set.

I didn't do much parameter-tuning on the Mandarin dataset. I think there are at least 2 points that can be considered to improve the performance of Mandarin TTS:

Use phoneme as input or split Pinyin into consonant and vowel, instead of treating them as a pure character sequence as I do.
For the synthesis of out-of-dataset texts, do the prosodic boundary prediction as in the transcription.

Liujingxiu23 · 2021-07-19T10:42:43Z

@light1726 Thank you very much for your reply. I will training my mandarin dataset with phone-sequences and prosody boundary infos to see the performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synthesized wavs of long texts #7

synthesized wavs of long texts #7

Liujingxiu23 commented Jul 19, 2021 •

edited

Loading

light1726 commented Jul 19, 2021

Liujingxiu23 commented Jul 19, 2021

synthesized wavs of long texts #7

synthesized wavs of long texts #7

Comments

Liujingxiu23 commented Jul 19, 2021 • edited Loading

light1726 commented Jul 19, 2021

Liujingxiu23 commented Jul 19, 2021

Liujingxiu23 commented Jul 19, 2021 •

edited

Loading