You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thanks for your great work.
Can you provide additional ablations obtained using different batch size ? (e.g. smaller batch size 512/256, instead of the 1024 reported in paper)
I vary the training batch size but I find that the final result vary a lot.
The text was updated successfully, but these errors were encountered:
Hi, @Alxead .
From our experience, a "sqrt" scheduling method should be used to adjust the learning rate.
As our default setting, the learning rate for batch size 1024 is: 1024 / 256 * 1 = 4.
With sqrt scheduling, the learning rate for batch size 512 should be: 4 * sqrt(512 / 1024) = 2.828. We can modify the train script with '--base-lr 1.414' to achieve this.
Hi,
Thank you for your contribution. I was thinking that did you use learning rate decay as the learning rate is quite high and it should reduce as network converges.
Thanks,
Ram
Hello, thanks for your great work.
Can you provide additional ablations obtained using different batch size ? (e.g. smaller batch size 512/256, instead of the 1024 reported in paper)
I vary the training batch size but I find that the final result vary a lot.
The text was updated successfully, but these errors were encountered: