This is quick evaluation of different architectures functions performance on ImageNet-2012.
The architecture is similar to common ones for ImageNet, but has differences:
- Images are resized to small side = 128 for speed reasons.
- Networks are initialized with LSUV-init
ResNet attempts are moved to ResNets.md
CaffeNet only
Name | Accuracy | LogLoss | Comments |
---|---|---|---|
CaffeNet256 | 0.565 | 1.87 | Reference BVLC model, LSUV init |
CaffeNet128 | 0.471 | 2.36 | Pool5 = 3x3 |
CaffeNet128_4096 | 0.497 | 2.24 | Pool5 = 3x3, fc6-fc7=4096 |
CaffeNet128All | 0.530 | 2.05 | All improvements without caffenet arch change: ELU + SPP + color_trans3-10-3 + Nesterov+ (AVE+MAX) Pool + linear lr_policy |
+ 0.06 | Gain over vanilla caffenet128. "Sum of gains" = 0.018 + 0.013 + 0.015 + 0.003 + 0.013 + 0.023 = 0.085 | ||
SqueezeNet128 | 0.530 | 2.08 | Reference solver, but linear lr_policy and batch_size=256 (320K iters). WITHOUT tricks like ELU, SPP, AVE+MAX, etc. |
SqueezeNet128 | 0.547 | 2.08 | New SqueezeNet solver. WITHOUT tricks like ELU, SPP, AVE+MAX, etc. |
SqueezeNet224 | 0.592 | 1.80 | New SqueezeNet solver. WITHOUT tricks like ELU, SPP, AVE+MAX, etc., 2 GPU |
SqueezeNet128+ELU | 0.555 | 1.95 | Reference solver, but linear lr_policy and batch_size=256 (320K iters).ELU |
CaffeNet256All | 0.613 | 1.64 | All improvements without caffenet arch change: ELU + SPP + color_trans3-10-3 + Nesterov+ (AVE+MAX) Pool + linear lr_policy |
CaffeNet128, no pad | 0.411 | 2.70 | No padding, but conv1 stride=2 instead of 4 to keep size of pool5 the same |
CaffeNet128, dropout in conv | 0.426 | 2.60 | Dropout before pool2=0.1, after conv3 = 0.1, after conv4 = 0.2 |
CaffeNet128SPP | 0.483 | 2.30 | SPP= 3x3 + 2x2 + 1x1 |
DarkNet128BN | 0.502 | 2.25 | 16C3->MP2->32C3->MP2->64C3->MP2->128C3->MP2->256C3->MP2->512C3->MP2->1024C3->1000CLF.BN |
+ PreLU + base_lr=0.035, exp lr_policy, 160K iters | |||
CaffeNet128, no group conv | 0.487 | 2.26 | Plain convolution instead group one |
NiN128 | 0.519 | 2.15 | Step lr_policy. Be carefull to not use dropout on maxpool in-place |
Others
Name | Accuracy | LogLoss | Comments |
---|---|---|---|
DarkNetBN | 0.502 | 2.25 | 16C3->MP2->32C3->MP2->64C3->MP2->128C3->MP2->256C3->MP2->512C3->MP2->1024C3->1000CLF.BN |
HeNet2x2 | 0.561 | 1.88 | No SPP, Pool5 = 3x3, VLReLU, J' from paper |
HeNet3x1 | 0.560 | 1.88 | No SPP, Pool5 = 3x3, VLReLU, J' from paper, 2x2->3x1 |
GoogLeNet128 | 0.619 | 1.61 | linear lr_policy, batch_size=256. obviously slower than caffenet |
GoogLeNet128Res | 0.634 | 1.56 | linear lr_policy, batch_size=256. Resudial connections between inception block. No BN |
GoogLeNet128Res_color | 0.638 | 1.52 | linear lr_policy, batch_size=256. Resudial connections between inception block. No BN. + color_trans3-10-3 |
googlenet_loss2_clf | 0.571 | 1.80 | from net above, aux classifier after inception_4d |
googlenet_loss1_clf | 0.520 | 2.06 | from net above, aux classifier after inception_4a |
GoogLeNet128_BN_after | 0.596 | 1.70 | BN After ReLU |
[GoogLeNet128_BN_lim0606][https://github.com/lim0606/caffe-googlenet-bn] | 0.645 | 1.54 | BN before ReLU + scale bias, linear LR, batch_size = 128, base_lr = 0.005, 640K iter, LSUV init. !!! 5x5 replaced by two 3x3 |
fitnet1_elu | 0.333 | 3.21 | |
VGGNet16_128 | 0.651 | 1.46 | Surprisingly much better that GoogLeNet128, even later is with step-based solver. |
VGGNet16_128_All | 0.682 | 1.47 | ELU (a=0.5. a=1 leads to divergence :( ), avg+max pool, color conversion, linear lr_policy |
Architectures tested:
- CaffeNet (pool5 size = 3x3)
- HeNet Convolutional Neural Networks at Constrained Time Cost. The difference with paper is VLReLU (converges faster at start) and no SPP pooling, instead used "classical" pool5
- CaffeNetSPP, single scale training (SPP pool5 = 3x3 + 2x2 + 1x1) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition]
- GoogleNet Going Deeper with Convolutions
Architectures are selected, that their theoretical and/or practical computational complexity ~ caffenet. Currently, holds for all except HeNet, which is slower in practice.
*** Contib Base net here is caffenet+BN+PReLU+dropout=0.2
Base net is caffenet+BN+ReLU+drop=0.2 There difference in filters (main, 5x5 -> 3x3 + 3x3 or 1x5+5x1) and solver.
Name | Accuracy | LogLoss | Comments |
---|---|---|---|
Base | 0.527 | 2.09 | |
Base_dereyly_lr, noBN, ReLU | 0.441 | 2.53 | max_iter=160K, stepsize=2K, gamma=0.915, but default caffenet |
Base_dereyly 5x1, noBN, ReLU | 0.474 | 2.31 | 5x5->1x5+5x1 |
Base_dereyly_PReLU | 0.550 | 1.93 | BN, PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->3x3+3x3 |
Base_dereyly 3x1 | 0.553 | 1.92 | PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->1x3+1x3+3x1+1x3 |
Base_dereyly 3x1 scale aug | 0.530 | 2.04 | Same as previous, img: 128 crop from (128...300)px image, test resize to 144, crop 128 |
Base_dereyly 3x1 scale aug | 0.512 | 2.17 | Same as previous, img: 128 crop from (128...300)px image, test resize to (128+300)/2, crop 128 |
Base_dereyly 3x1->5x1 | 0.546 | 1.97* | PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->1x5+1x5+5x1+1x5 |
Base_dereyly 3x1,halfBN | 0.544 | 1.95 | PreLU + base_lr=0.035, exp lr_policy, 160K iters,5x5->1x3+1x3+3x1+1x3, BN only for pool and fc6 |
Base_dereyly 5x1 | 0.540 | 2.00 | PreLU + base_lr=0.035, exp lr_policy, 160K iters, 5x5->1x5+5x1 |
DarkNetBN | 0.502 | 2.25 | 16C3->MP2->32C3->MP2->64C3->MP2->128C3->MP2->256C3->MP2->512C3->MP2->1024C3->1000CLF.BN |
+ PreLU + base_lr=0.035, exp lr_policy, 160K iters |
Name | Accuracy | LogLoss | Comments |
---|---|---|---|
VGG-Like | 0.521 | 2.14 | 1st layer = 7x7 stride 2, unlike VGG. All other layer = 1/2 VGG width |
VGG-LikeRes | 0.576 | 1.83 | with residual connections, no BN |
VGG-LikeResDrop | 0.568 | 1.91 | with residual connections, no BN , dropout in conv |
The PRs with test are welcomed
P.S. Logs are merged from lots of "save-resume", because were trained at nights, so plot "Accuracy vs. seconds" will give weird results.