English dataset creation #4

juhonkang · 2023-06-25T06:55:29Z

I have seen many manual dataset creation steps inside the workflow, any way to do that automatically?

And could we connect by mail? I also have several questions.

yqzhishen · 2023-06-25T07:04:06Z

There are currently no mature automatic dataset making workflow for English now...

As for Chinese and Japanese, we use Montreal Forced Aligner to get phoneme durations from lyrics. This requires pretrained models from a large singing corpus (~50h), and we haven't done that for English yet.

dutchsing009 · 2023-08-28T17:19:37Z

@yqzhishen Hey how are you. I have about 40-50 hours of private English singing , can you tell me or guide me on how to train an MFA for it ? and what are the requirements etc ? like for example do these 40-50 hours need to be transcribed or have a certain thing ? thanks cant wait to hear from you!!

yqzhishen · 2023-08-31T16:25:42Z

@dutchsing009 Training data of MFA should be transcribed. You may need an English dictionary as well. Please refer to the official documentation of MFA: https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

English dataset creation #4

English dataset creation #4

juhonkang commented Jun 25, 2023

yqzhishen commented Jun 25, 2023

dutchsing009 commented Aug 28, 2023

yqzhishen commented Aug 31, 2023

English dataset creation #4

English dataset creation #4

Comments

juhonkang commented Jun 25, 2023

yqzhishen commented Jun 25, 2023

dutchsing009 commented Aug 28, 2023

yqzhishen commented Aug 31, 2023