Why inference TTS doesn't need to mask? #146

YuXiangLo · 2024-07-11T08:39:03Z

As title mentioned, I wonder if we not mask the audio, namely y, then how can the model know there is a tts going to be conducted?

zmy1116 · 2024-08-04T23:33:33Z

i want to ask this too. i haven't tested yet but I wonder how results differ if I change the end part to be mask0 EOS mask0 empty .

zmy1116 · 2024-08-04T23:38:53Z

o I think I think I understand based on Jason's answer on a different question..... for zeroshot TTS , it looks like DIFFERENT model is trained without causal mask. like you can see for edits and tts there are two different weights !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why inference TTS doesn't need to mask? #146

Why inference TTS doesn't need to mask? #146

YuXiangLo commented Jul 11, 2024

zmy1116 commented Aug 4, 2024

zmy1116 commented Aug 4, 2024

Why inference TTS doesn't need to mask? #146

Why inference TTS doesn't need to mask? #146

Comments

YuXiangLo commented Jul 11, 2024

zmy1116 commented Aug 4, 2024

zmy1116 commented Aug 4, 2024