AGP Pruner may be messed by the bias of batch normalization? #3851

twmht · 2021-06-18T10:11:56Z

twmht
Jun 18, 2021

training from model_speedup() is good since the bias of batch normalization can be masked out.

The example shows an end to end trainer, where the model is speed up before fine tuning.

However, Some iterative pruner like AGP pruner, which would update the mask after some iterations. So it can't speed up before fine tuning. But the zero-out weight would be active by the bias of batch normalization during traning. that is, the output of apply_compression_result is not same with model_speedup().

is this a serious problem? In small problem like cifar10 or mnist, this might won't be a problem, but what about some large scale training data, like imagenet?

J-shang · 2021-06-21T03:34:20Z

J-shang
Jun 21, 2021
Maintainer

@twmht Yes, thanks for your founding, it is a good question, the essential reason is the lack of nni mask simulation way. I think this might a serious problem for the iterative way because this means the mask is not a real simulation of pruning in some cases.

For this issue, there might two candidate solutions. The one is automatically synchronizing the mask of the context, i.e. mask the output of BN at the same time. The other is speed up during iteration.

0 replies

kvartet · 2021-06-21T12:33:08Z

kvartet
Jun 21, 2021

Hello @twmht, thank you for continuously exploring and contributing to NNI🤗. Would you mind sharing your current work and suggestions or expectations for NNI with us? Maybe we can book an online meeting this week or next week if you have time~

3 replies

twmht Jul 9, 2021
Author

@kvartet

Yup. It seems that it's better to replace BatchNorm layer with mask other than Convolution layer. So the iterative pruner would make more sense. Model speedup at each pruning iteration is not a good idea, since the dead weight would not be active if you do so (see https://arxiv.org/abs/1808.06866).

And according to the paper https://arxiv.org/abs/1810.05270, they said that start from the pruned weight is not necessary. But in this paper (https://arxiv.org/abs/1803.03635), it has opposite opinion.
this should give more evidence. if former paper is True, we should only consider NAS to compress the model.

QuanluZhang Jul 9, 2021

good point. First, whether speed up the model after each iteration is configurable in NNI. in some cases, users can apply model speedup after each iteration to speed up the fine-tuning.

second, compression can be seen as one type of NAS :). the former paper argue that training a pruned model from scratch could obtain comparable accuracy with fine-tuning. But usually fine-tuning is faster, especially for large models. the authors claim that pruning is still used for finding the sparse/pruned model (as an approach for architecture search).

twmht Jul 9, 2021
Author

But usually fine-tuning is faster, especially for large models

Agree:)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGP Pruner may be messed by the bias of batch normalization? #3851

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

AGP Pruner may be messed by the bias of batch normalization? #3851

twmht Jun 18, 2021

Replies: 2 comments · 3 replies

J-shang Jun 21, 2021 Maintainer

kvartet Jun 21, 2021

twmht Jul 9, 2021 Author

QuanluZhang Jul 9, 2021

twmht Jul 9, 2021 Author

twmht
Jun 18, 2021

Replies: 2 comments 3 replies

J-shang
Jun 21, 2021
Maintainer

kvartet
Jun 21, 2021

twmht Jul 9, 2021
Author

twmht Jul 9, 2021
Author