-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional questions #49
Comments
Hi Minseung-Kim, For 1): That is indeed how we get the noise for DEMAND-VB, and we only use the noise from DEMAND-VB (no external noise set is used). Any one of those noise samples can then be used to corrupt a clean speech recording. i.e., we no longer treat them as clean speech and noise pairs, we treat them as two independent sets (this is only for the training set only). If you look at Line 258 in aad965d
A noise sample is randomly selected to corrupt a clean speech recording (only if its length is equal to or greater than the clean speech recording). If a noise sample does not meet this condition, another noise sample is selected. This continues until a noise sample is randomly selected that meets the condition. For 2): If I was re-implementing this framework now, I would certainly use early stopping. But, back in 2019, a maximum amount of epochs was specified, and the epoch that attained the highest validation scores was selected as the epoch to be tested. I hope this helps, please let me know if something I said is not clear. |
On a side note, I am also using PyTorch and PyTorch Lightning now, let me know if you are interested in helping to update this repository to something PyTorch based :) |
Thank you for the reply! Now I understand. |
We have only used the 28 speaker version. |
Oh, thank you for the response. Or, is there any different way to set a validation set (in your experience)? |
Hi Minseung-Kim, Using two of the speakers for the validation set has been the standard way. I have not personally seen it done another way :) |
@anicolson |
Hello again,
I am trying to reproduce the deepxi framework in the torch (tensorflow is not so familiar to me.. lol) and have some questions.
When we subtract the clean from noisy, we can get the corresponding noise signal.
For the Demand voicebank dataset, did you use only those dataset pairs (they were provided)? or an additional clean or noise dataset?
In my previous question, you said that the noise recording used to corrupt the clean speech is randomly selected. (this imply noise recording should be longer than clean speech)
If then, Could you tell me how kind of additional noise recording did you use? and Have you used additional clean speech other than provided in Demand voicebank dataset?
As far as I know, the validation set is often used for early stopping. Is the validation set in deepxi framework also be used for this purpose?
Could you explain to me how the validation set was used?
Thank you!
The text was updated successfully, but these errors were encountered: