-
Notifications
You must be signed in to change notification settings - Fork 282
Benchmarks
We evaluate performance with VGG16, GoogleNet(Inception-V1), ResNet50, Mobilenet, Squeezenet and densenet-121 respectively, on the following 5 ARM devices and make a bare-metal comparison with other AI chips. Note that all the following timeusage is measured in microseconds (ms).
Device | Processor | #CPUs @ Clock Speed | CPU Arch. | Memory (ms) | OS | SOC Power |
---|---|---|---|---|---|---|
Samsung S8 | Snapdragon 835 | 4 @ 2.45Ghz + 4 @ 1.90GHz | Kryo | 4GB | Android 7.0 | ~5W |
iPhone 7 | A10 Fusion | 2 @ 2.34Ghz + 2 @ 1.05GHz | Hurricane | 2GB | iOS 11.1 | ~5W |
Huawei D05 | Hi1616 | 2 * 32 @ 2.40GHz | Cortex-A72 | 256GB | Ubuntu 16.04 | >100W |
Phytium FT1500A/16 | FTC660 | 16 @ 1.50GHz | Earth | 64GB | Kylin 5.0 | 35W |
RK3399 | RK3399 | 2 @ 1.8Ghz + 4 @ 1.40GHz | Cortex-A72 | 2GB | Debian | 6.05W |
Raspberry Pi3 | Broadcom BCM2837 | 4 @ 1.2Ghz | Cortex-A53 | 1GB | Ubuntu 16.04 | ~5W |
-Benchmarking FeatherCNN with F(6x6,3x3) on VGG16 with 5 devices and other AI chips.
Devices\Cores | 1 | 2 | 4 | 8 | GPU |
---|---|---|---|---|---|
S8 | 925 | 630 | 489 | 615 | |
IP7 | 374 | 284 | |||
D05 | 755 | 399 | 226 | 149 | |
FT1500A | 1769 | 1020 | 639 | 444 | |
RK3399 | 1673 | 1420 | |||
TensorFlow lite on IPhone7 | 905 | ||||
ARM CL on RK3399 | 4103 | 1645 | |||
TVM on RK3399 | - | 1293 | |||
Intel Movidius | 812* | ||||
Cambricon 1A | 100# |
- Intel Movidius is a DSP accelerator with precision of FT16.
- Cambricon 1A (NPU) uses 100ms with normalmodel (FT16), and 50ms with sparsemodel.
To contrast, we have also tested multiple other libraries on the same devices as baseline, including Caffe + OpenBLAS
, Caffe2 + Eigen
and Caffe2 + NNPACK
.
To evaluated the scalabiltiy of state-of-art CNN inference tools, Huawei D05 Server is a domestically made many-core arm server with 64 arm A72 cores. All these 64 cores are inter-connected with a token-ring network.
Network | 1 | 2 | 4 | 8 | 16 | 32 | 64 |
---|---|---|---|---|---|---|---|
[VGG16] | 1333 | 697 | 385 | 218 | 157 | 117 | 102 |
[GoogleNet] | 333 | 210 | 154 | 125 | 126 | 151 | 230 |
[Resnet-50] | 573 | 356 | 187 | 117 | 104 | 65 | 194 |
[squeezenet] | 149 | 79 | 44 | 28 | 29 | 35 | 67 |
[mobilenet] | 124 | 70 | 42 | 36 | 34 | 52 | 76 |
[densenet-121] | 517 | 273 | 156 | 98 | 113 | 160 | 331 |
Network | 1 | 2 | 4 | 8 | 16 | 32 | 64 | speedup |
---|---|---|---|---|---|---|---|---|
[VGG16] | 3329 | 2227 | 1443 | 1108 | 1137 | 2109 | 3721 | 10.86 |
[GoogleNet] | 1028 | 929 | 861 | 831 | 822 | 848 | 857 | 13.7 |
[Resnet-50] | 728 | 490 | 347 | 278 | 252 | 346 | 365 | 3.88 |
[squeezenet] | 190 | 127 | 92 | 76 | 74 | 84 | 92 | 1.68 |
[mobilenet] | 211 | 166 | 146 | 139 | 137 | 153 | 184 | 4.03 |
[densenet-121] | 865 | 593 | 438 | 373 | 354 | 655 | 856 | 3.08 |
speedup
is caculated with the minimum time usage of the given tool divided by the minimum time usage of FeatherCNN over all cores.
Network | 1 | 2 | 4 | 8 | 16 | 32 | 64 | speedup |
---|---|---|---|---|---|---|---|---|
[VGG16] | 3267 | 2173 | 1550 | 1310 | 1385 | 1323 | 1401 | 12.84 |
[GoogleNet] | 351 | 347 | 267 | 306 | 894 | 2422 | 3938 | 4.45 |
[Resnet-50] | 869 | 549 | 374 | 262 | 149 | 355 | 724 | 2.29 |
[squeezenet] | 91 | 65 | 55 | 87 | 221 | 628 | 723 | 1.25 |
[mobilenet] | 174 | 139 | 110 | 90 | 110 | 171 | 592 | 2.65 |
[densenet-121] | x | x | x | x | x | x | x | x |
x
means caffe2+eigen can not successfully implement densenet-121 network.
As ARM has a unique big.little archtecture for energy saving, to evaluate the adaptation of schduling algortihm and blocking strategies with this big.little archtecture, RK3399 is selected as an widely used embeded developing board for testing. RK3399 has 2 big cores with 1.8GHz, and 4 little cores with 1.4GHz.
Network | 1 | 2 | 1 | 2 | 4 | all | Memory (MB) |
---|---|---|---|---|---|---|---|
[VGG16] | 2268 | 1620 | 6122 | 3422 | 2269 | 1932 | 904 |
[GoogleNet] | 416 | 250 | 927 | 524 | 333 | 294 | 168 |
[Resnet-50] | 857 | 517 | 1834 | 1009 | 671 | 555 | 466 |
[squeezenet] | 236 | 144 | 539 | 315 | 210 | 172 | 404 |
[mobilenet] | 242 | 137 | 487 | 271 | 165 | 153 | 176 |
[densenet-121] | 842 | 543 | 1854 | 1050 | 686 | 543 | 111 |
Network | 1 | 2 | 4 |
---|---|---|---|
[VGG16] | - | - | - |
[GoogleNet] | 1058 | 642 | 809 |
[Resnet-50] | 2107 | 1255 | 1540 |
[squeezenet] | 638 | 399 | 501 |
[mobilenet] | 451 | 275 | 206 |
[densenet-121] | 630 | 396 | 459 |
Network | 1 | 2 | 1 | 2 | 4 | all |
---|---|---|---|---|---|---|
[VGG16] | 1325 | 706 | 2540 | 1507 | 1226 | 844 |
[GoogleNet] | 274 | 146 | 366 | 206 | 127 | 105 |
[Resnet-50] | 480 | 266 | 759 | 417 | 261 | 215 |
[squeezenet] | 88 | 115 | 73 | 61 | 204 | 153 |
[mobilenet] | 156 | 87 | 211 | 116 | 68 | 56 |
[densenet-121] | - | - | - | - | - | - |
Network | 1 | 2 | 1 | 2 | 4 | all |
---|---|---|---|---|---|---|
[VGG16] | x | x | - | - | - | - |
[GoogleNet] | x | x | - | - | - | - |
[Resnet-50] | x | x | - | - | - | - |
[squeezenet] | x | x | x | - | - | - |
[mobilenet] | x | x | x | - | - | - |
[densenet-121] | - | - | - | - | - | - |