You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I downloaded the pretrained model of databaker and synthesized wavs using inference.py.
The results are not very good, I mean the alignment is not right especially when the input text is long.
For example, "失恋的人特别喜欢往人烟罕至的角落里钻。", the synthesized wavs sounds like:
失恋的人特别喜欢往人烟罕至的_角角落里钻钻钻钻_
For longer input text,the synthesized wavs are totally wrong
The text was updated successfully, but these errors were encountered:
Hi @Liujingxiu23, thanks for your feedback. Attention errors can happen for vaenar-tts since there's no restriction posed to attention alignment to make it monotonic, most of them are repetitions of phonemes. From my observation, such cases are rare. It never occurs to me that the synthesized waveform is totally wrong for a sentence.
Synthesis of long sentences is more challenging as there are not many long sentences in the training set.
I didn't do much parameter-tuning on the Mandarin dataset. I think there are at least 2 points that can be considered to improve the performance of Mandarin TTS:
Use phoneme as input or split Pinyin into consonant and vowel, instead of treating them as a pure character sequence as I do.
For the synthesis of out-of-dataset texts, do the prosodic boundary prediction as in the transcription.
@light1726 Thank you very much for your reply. I will training my mandarin dataset with phone-sequences and prosody boundary infos to see the performance.
I downloaded the pretrained model of databaker and synthesized wavs using inference.py.
The results are not very good, I mean the alignment is not right especially when the input text is long.
For example, "失恋的人特别喜欢往人烟罕至的角落里钻。", the synthesized wavs sounds like:
失恋的人特别喜欢往人烟罕至的_角角落里钻钻钻钻_
For longer input text,the synthesized wavs are totally wrong
The text was updated successfully, but these errors were encountered: