a pytorch implement of shuffleNet on cifar-10
-
channel shuffle is a operation proposed in shuffleNet to adress the information isolation between channels while using successive group convolution.
-
It can be done using only several lines code
# channel shuffle n, c, w, h = x.shape x = x.view(n, self.g, self.n, w, h) x = x.transpose_(1, 2).contiguous() x = x.view(n, c, w, h)
-
there is a demo picture to show what happened when channel shuffle
-
To make it suit cifar10's image size, I have disabled some downsample operation (i.e. maxpooling or stride = 2) and just keep the last two
-
because of the low efficiency of group convolution, it takes relatively long time to train, more details can be seen below
scale factor groups params/M flops/M training time accuracy 1.0 8 0.9131 161.70 11.4h 92.29% 0.5 8 0.2507 43.43 6.5h 91.48% 0.5 3 0.2427 42.97 4.0h 92.60% 0.5 1 0.2487 44.63 3.6h 91.44% -
here the accuracy means the max accuracy on validation set
-
training time is measured on a titan x (pascal) GPU
-
the results is comparable with resnet 20 which have the similar number of parameters:
resnet 20 params: 0.27M accuracy: 91.25%
-
-
more logs and the best weights can be get in folder
bak
- pytorch 0.4.0
- python 3.x