Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to prepare dataset for training #13

Open
maisfeldchen opened this issue Nov 29, 2024 · 3 comments
Open

How to prepare dataset for training #13

maisfeldchen opened this issue Nov 29, 2024 · 3 comments

Comments

@maisfeldchen
Copy link

maisfeldchen commented Nov 29, 2024

First of all, thank you for this repo! The demo results sound very good already!

I'd like to train my own model with and LQ and HQ pairs as dataset, to get the best results for my specific use case, but i'm a bit lost when it comes to actually preparing it. Could you please provide more detailed instructions?

My questions, for example, are:

  • I assume the "hdf5_datas" folder contains the packed HDF5 archives, correct?
  • Do the codec options in the config YAML have any effect, is it necessary to adjust them to the dataset? I'd assume they don't.
  • What is the exact folder structure within the HDF5 archives? Looking at the code in the MusdbMoisesdbDataset class, it seems like there are no subfolders, and the pairs are only distinguished by "ori_" and "codec_" at the beginning of each file name.
  • Can i pack everything into just 1 HDF5 archive, and simply point the config to the "hdf5_datas" folder?
  • Do codec and original files need to be exactly in sync, or can they differ by a few ms too?

I'd be really thankful if you could give me some insight on this!

@JusperLee
Copy link
Owner

Thank you for your questions! I’m happy to help clarify things.

  1. Yes, you are correct. The "hdf5_datas" folder contains the packed HDF5 archives with processed data.

  2. As for the codec options in the config YAML, they likely don't have a significant effect on the dataset. You can generally stick with the default settings unless you have a specific need to change them.

  3. The folder structure within the HDF5 archives stores audio that has been segmented via VAD (Voice Activity Detection) under the "data" key. These files are raw, uncompressed audio. Compression (e.g., codec-specific formats) is applied in the Dataset class later.

  4. I would not recommend packing everything into a single HDF5 archive, as this could cause memory issues, especially with large datasets. It's better to split the data into multiple HDF5 archives.

  5. Currently, I am using perfectly synchronized codec and original files. While I think slight misalignment (a few milliseconds) may still work, it could potentially affect the results, so synchronization is ideal.

I hope this helps! Let me know if you have any further questions.

@maisfeldchen
Copy link
Author

Thank you, this cleared up a few things. However, your answer to my 4th question honestly caused me more additional questions than answers...

How would I go about creating these structured HDF5 archives exactly? Do you happen to have a script that you used to create them?

@JusperLee
Copy link
Owner

image You can structure your own dataset in this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants