Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate batch norm statistic loss on parallel training #16

Open
dohe0342 opened this issue May 6, 2021 · 3 comments
Open

Calculate batch norm statistic loss on parallel training #16

dohe0342 opened this issue May 6, 2021 · 3 comments

Comments

@dohe0342
Copy link

dohe0342 commented May 6, 2021

Hello, I have one question about batch norm statistic loss.

Consider parallel training. I have 8 GPUs. and 1 gpu can bear 128 batch size.

But you know, batch norm statistic loss is calculated on each machine and each machine share their gradients not whole batch(1024). And I think this can cause image quality degradation.

So, here is my question. How can I calculate batch norm statistic loss on parallel training just like calculating whole batch size not mini-batch

@hkunzhe
Copy link

hkunzhe commented May 13, 2021

If you are using DistributedDataParallel, try to convert BatchNorm layers to SyncBatchNorm ones.

@dohe0342
Copy link
Author

I know about SyncBatchNorm.But DeepInversion should calculate loss about each pixel and my gpu can't bear it.

@hongxuyin
Copy link
Contributor

Hi @dohe0342 one way to try is to reduce batch size to alleviate the GPU burden. Also try using setting 2k iteration one to save on GPU burdern. Additionally you can try to use the dataset synthesized we provided in the repository. Let me know if it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants