Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding the how gradient accumulation is done. (It looks like we didn't /accumulation_steps when backprop loss ) #151

Open
zmy1116 opened this issue Aug 8, 2024 · 0 comments

Comments

@zmy1116
Copy link

zmy1116 commented Aug 8, 2024

Hello,

I'm starting to train Voicecraft on a custom dataset. I have a different hardware setup (L4 GPU instead A40) so I'm adjusting training configuration.

I noticed that you used unusually large gradient accumulation steps (12) and when you backpropagate, it looks like you didn't average by accumulation steps.

for j in range(self.args.gradient_accumulation_steps):
cur_ind = all_inds[j::self.args.gradient_accumulation_steps]
cur_batch = {key: batch[key][cur_ind] for key in batch}
with torch.cuda.amp.autocast(dtype=torch.float16 if self.args.precision=="float16" else torch.float32):
out = self.model(cur_batch)

VoiceCraft/steps/trainer.py

Lines 138 to 141 in 4873249

if self.args.optimizer_name == "ScaledAdam":
self.scaler.scale(out['loss']).backward()
else:
self.scaler.scale(out['loss']/out['effective_ntoken']).backward()

Does this mean the backpropagated loss become proportional to the gradient accumulation steps? Say you are doing 12 steps now with A40 gpu with 48 GB memory , since I use L4 GPU with 24 GB memory, I need to drop inference batch size by half and increase gradient accumulation steps. that would be equivalent to drop LR by 2.

Alternatively, I've been reworking on the dynamic sampler and I'm able to fit 20000 audio tokens training in 8 L4 gpu in 4 steps instead of 12. If I don't adjust LR, that means LR would be drop to 1/3.

What do you think?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant