Replies: 2 comments 3 replies
-
@twmht Yes, thanks for your founding, it is a good question, the essential reason is the lack of nni mask simulation way. I think this might a serious problem for the iterative way because this means the mask is not a real simulation of pruning in some cases. For this issue, there might two candidate solutions. The one is automatically synchronizing the mask of the context, i.e. mask the output of BN at the same time. The other is |
Beta Was this translation helpful? Give feedback.
-
Hello @twmht, thank you for continuously exploring and contributing to NNI🤗. Would you mind sharing your current work and suggestions or expectations for NNI with us? Maybe we can book an online meeting this week or next week if you have time~ |
Beta Was this translation helpful? Give feedback.
-
training from model_speedup() is good since the bias of batch normalization can be masked out.
The example shows an end to end trainer, where the model is speed up before fine tuning.
However, Some iterative pruner like AGP pruner, which would update the mask after some iterations. So it can't speed up before fine tuning. But the zero-out weight would be active by the bias of batch normalization during traning. that is, the output of
apply_compression_result
is not same withmodel_speedup()
.is this a serious problem? In small problem like cifar10 or mnist, this might won't be a problem, but what about some large scale training data, like imagenet?
Beta Was this translation helpful? Give feedback.
All reactions