Batch size ablation results #5

Yingdong-Hu · 2021-03-31T03:00:55Z

Hello, thanks for your great work.
Can you provide additional ablations obtained using different batch size ? (e.g. smaller batch size 512/256, instead of the 1024 reported in paper)
I vary the training batch size but I find that the final result vary a lot.

impiga · 2021-04-04T12:57:35Z

Hi, @Alxead .
From our experience, a "sqrt" scheduling method should be used to adjust the learning rate.
As our default setting, the learning rate for batch size 1024 is: 1024 / 256 * 1 = 4.
With sqrt scheduling, the learning rate for batch size 512 should be: 4 * sqrt(512 / 1024) = 2.828. We can modify the train script with '--base-lr 1.414' to achieve this.

ramchandracheke · 2022-03-02T14:04:19Z

Hi,
Thank you for your contribution. I was thinking that did you use learning rate decay as the learning rate is quite high and it should reduce as network converges.
Thanks,
Ram

zdaxie assigned impiga Apr 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch size ablation results #5

Batch size ablation results #5

Yingdong-Hu commented Mar 31, 2021

impiga commented Apr 4, 2021

ramchandracheke commented Mar 2, 2022

Batch size ablation results #5

Batch size ablation results #5

Comments

Yingdong-Hu commented Mar 31, 2021

impiga commented Apr 4, 2021

ramchandracheke commented Mar 2, 2022