You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm starting to train Voicecraft on a custom dataset. I have a different hardware setup (L4 GPU instead A40) so I'm adjusting training configuration.
I noticed that you used unusually large gradient accumulation steps (12) and when you backpropagate, it looks like you didn't average by accumulation steps.
Does this mean the backpropagated loss become proportional to the gradient accumulation steps? Say you are doing 12 steps now with A40 gpu with 48 GB memory , since I use L4 GPU with 24 GB memory, I need to drop inference batch size by half and increase gradient accumulation steps. that would be equivalent to drop LR by 2.
Alternatively, I've been reworking on the dynamic sampler and I'm able to fit 20000 audio tokens training in 8 L4 gpu in 4 steps instead of 12. If I don't adjust LR, that means LR would be drop to 1/3.
What do you think?
Thanks
The text was updated successfully, but these errors were encountered:
Hello,
I'm starting to train Voicecraft on a custom dataset. I have a different hardware setup (L4 GPU instead A40) so I'm adjusting training configuration.
I noticed that you used unusually large gradient accumulation steps (12) and when you backpropagate, it looks like you didn't average by accumulation steps.
VoiceCraft/steps/trainer.py
Lines 87 to 91 in 4873249
VoiceCraft/steps/trainer.py
Lines 138 to 141 in 4873249
Does this mean the backpropagated loss become proportional to the gradient accumulation steps? Say you are doing 12 steps now with A40 gpu with 48 GB memory , since I use L4 GPU with 24 GB memory, I need to drop inference batch size by half and increase gradient accumulation steps. that would be equivalent to drop LR by 2.
Alternatively, I've been reworking on the dynamic sampler and I'm able to fit 20000 audio tokens training in 8 L4 gpu in 4 steps instead of 12. If I don't adjust LR, that means LR would be drop to 1/3.
What do you think?
Thanks
The text was updated successfully, but these errors were encountered: