Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why inference TTS doesn't need to mask? #146

Open
YuXiangLo opened this issue Jul 11, 2024 · 2 comments
Open

Why inference TTS doesn't need to mask? #146

YuXiangLo opened this issue Jul 11, 2024 · 2 comments

Comments

@YuXiangLo
Copy link

As title mentioned, I wonder if we not mask the audio, namely y, then how can the model know there is a tts going to be conducted?

@zmy1116
Copy link

zmy1116 commented Aug 4, 2024

i want to ask this too. i haven't tested yet but I wonder how results differ if I change the end part to be mask0 EOS mask0 empty .

@zmy1116
Copy link

zmy1116 commented Aug 4, 2024

o I think I think I understand based on Jason's answer on a different question..... for zeroshot TTS , it looks like DIFFERENT model is trained without causal mask. like you can see for edits and tts there are two different weights !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants