Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

English dataset creation #4

Open
juhonkang opened this issue Jun 25, 2023 · 3 comments
Open

English dataset creation #4

juhonkang opened this issue Jun 25, 2023 · 3 comments

Comments

@juhonkang
Copy link

I have seen many manual dataset creation steps inside the workflow, any way to do that automatically?

And could we connect by mail? I also have several questions.

@yqzhishen
Copy link
Member

There are currently no mature automatic dataset making workflow for English now...

As for Chinese and Japanese, we use Montreal Forced Aligner to get phoneme durations from lyrics. This requires pretrained models from a large singing corpus (~50h), and we haven't done that for English yet.

@dutchsing009
Copy link

@yqzhishen Hey how are you. I have about 40-50 hours of private English singing , can you tell me or guide me on how to train an MFA for it ? and what are the requirements etc ? like for example do these 40-50 hours need to be transcribed or have a certain thing ? thanks cant wait to hear from you!!

@yqzhishen
Copy link
Member

@dutchsing009 Training data of MFA should be transcribed. You may need an English dictionary as well. Please refer to the official documentation of MFA: https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants