-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train network with no bias in convolution layer #225
Comments
you can set lr_mult of batchnorm beta term to 0 to fix the beta, which is initialized as 0. |
Thanks @zhreshold , it worked! |
@zhreshold is this also fix 'gamma' term? |
@titikid You can leave gamma unfixed or not, depending your result, but I would prefer leave it free. |
I already trained 2 models from scratch, all parameters is set as default (lr=0.004, batch=48, single gpu)
|
you have to use ImageNet pre-trained weights, otherwise you need a DSSD variant. |
Hi @zhreshold |
@zhreshold can you give me some suggestion? |
@titikid For maximum flexibility I suggest you to use broadcast multiply instead of batchnorm itself. You have full control of how the behavior is without hacking batchnorm itself. |
@zhreshold i'm not really clear what do you mean for now, but i will investigate it. Thanks! |
Hi @zhreshold
I already train my mobilenetSSD network in Caffe with no bias. However, the convergence speed of network is too slow (mAP~35% after 3 days)
I just try Mxnet and i found that the performance in training of Mxnet is significantly better than Caffe. But i dont know how to remove 'beta' term in batch norm layer in MxNet like i did in Caffe. For another way, I remove batchnorm layer but the network couldn't converge.
Can you give me some hints?
The text was updated successfully, but these errors were encountered: