How to prepare dataset for training #13

maisfeldchen · 2024-11-29T09:55:19Z

First of all, thank you for this repo! The demo results sound very good already!

I'd like to train my own model with and LQ and HQ pairs as dataset, to get the best results for my specific use case, but i'm a bit lost when it comes to actually preparing it. Could you please provide more detailed instructions?

My questions, for example, are:

I assume the "hdf5_datas" folder contains the packed HDF5 archives, correct?
Do the codec options in the config YAML have any effect, is it necessary to adjust them to the dataset? I'd assume they don't.
What is the exact folder structure within the HDF5 archives? Looking at the code in the MusdbMoisesdbDataset class, it seems like there are no subfolders, and the pairs are only distinguished by "ori_" and "codec_" at the beginning of each file name.
Can i pack everything into just 1 HDF5 archive, and simply point the config to the "hdf5_datas" folder?
Do codec and original files need to be exactly in sync, or can they differ by a few ms too?

I'd be really thankful if you could give me some insight on this!

JusperLee · 2024-12-02T12:41:12Z

Thank you for your questions! I’m happy to help clarify things.

Yes, you are correct. The "hdf5_datas" folder contains the packed HDF5 archives with processed data.
As for the codec options in the config YAML, they likely don't have a significant effect on the dataset. You can generally stick with the default settings unless you have a specific need to change them.
The folder structure within the HDF5 archives stores audio that has been segmented via VAD (Voice Activity Detection) under the "data" key. These files are raw, uncompressed audio. Compression (e.g., codec-specific formats) is applied in the Dataset class later.
I would not recommend packing everything into a single HDF5 archive, as this could cause memory issues, especially with large datasets. It's better to split the data into multiple HDF5 archives.
Currently, I am using perfectly synchronized codec and original files. While I think slight misalignment (a few milliseconds) may still work, it could potentially affect the results, so synchronization is ideal.

I hope this helps! Let me know if you have any further questions.

maisfeldchen · 2024-12-02T15:54:57Z

Thank you, this cleared up a few things. However, your answer to my 4th question honestly caused me more additional questions than answers...

How would I go about creating these structured HDF5 archives exactly? Do you happen to have a script that you used to create them?

JusperLee · 2024-12-03T03:08:43Z

You can structure your own dataset in this way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to prepare dataset for training #13

How to prepare dataset for training #13

maisfeldchen commented Nov 29, 2024 •

edited

Loading

JusperLee commented Dec 2, 2024

maisfeldchen commented Dec 2, 2024

JusperLee commented Dec 3, 2024

How to prepare dataset for training #13

How to prepare dataset for training #13

Comments

maisfeldchen commented Nov 29, 2024 • edited Loading

JusperLee commented Dec 2, 2024

maisfeldchen commented Dec 2, 2024

JusperLee commented Dec 3, 2024

maisfeldchen commented Nov 29, 2024 •

edited

Loading