2.4. Mobile and Embedded Vision Applications Network series¶

2.4.1. Overview¶

MobileNetV1 is a network launched by Google in 2017 for use on mobile devices or embedded devices. The network replaces the depthwise separable convolution with the traditional convolution operation, that is, the combination of depthwise convolution and pointwise convolution. Compared with the traditional convolution operation, this combination can greatly save the number of parameters and computation. At the same time, MobileNetV1 can also be used for object detection, image segmentation and other visual tasks.

MobileNetV2 is a lightweight network proposed by Google following MobileNetV1. Compared with MobileNetV1, MobileNetV2 proposed Linear bottlenecks and Inverted residual block as a basic network structures, to constitute MobileNetV2 network architecture through stacking these basic module a lot. In the end, higher classification accuracy was achieved when FLOPS was only half of MobileNetV1.

The ShuffleNet series network is the lightweight network structure proposed by MEGVII. So far, there are two typical structures in this series network, namely, ShuffleNetV1 and ShuffleNetV2. A Channel Shuffle operation in ShuffleNet can exchange information between groups and perform end-to-end training. In the paper of ShuffleNetV2, the author proposes four criteria for designing lightweight networks, and designs the ShuffleNetV2 network according to the four criteria and the shortcomings of ShuffleNetV1.

MobileNetV3 is a new and lightweight network based on NAS proposed by Google in 2019. In order to further improve the effect, the activation functions of relu and sigmoid were replaced with hard_swish and hard_sigmoid activation functions, and some improved strategies were introduced to reduce the amount of network computing.

../_images/mobile_arm_top1.png

../_images/mobile_arm_storage.png

../_images/t4.fp32.bs4.mobile_trt.flops.png

../_images/t4.fp32.bs4.mobile_trt.params.png

Currently there are 32 pretrained models of the mobile series open source by PaddleClas, and their indicators are shown in the figure below. As you can see from the picture, newer lightweight models tend to perform better, and MobileNetV3 represents the latest lightweight neural network architecture. In MobileNetV3, the author used 1x1 convolution after global-avg-pooling in order to obtain higher accuracy,this operation significantly increases the number of parameters but has little impact on the amount of computation, so if the model is evaluated from a storage perspective of excellence, MobileNetV3 does not have much advantage, but because of its smaller computation, it has a faster inference speed. In addition, the SSLD distillation model in our model library performs excellently, refreshing the accuracy of the current lightweight model from various perspectives. Due to the complex structure and many branches of the MobileNetV3 model, which is not GPU friendly, the GPU inference speed is not as good as that of MobileNetV1.

2.4.2. Accuracy, FLOPS and Parameters¶

Models	Top1	Top5	Reference top1	Reference top5	FLOPS (G)	Parameters (M)
MobileNetV1_x0_25	0.514	0.755	0.506		0.070	0.460
MobileNetV1_x0_5	0.635	0.847	0.637		0.280	1.310
MobileNetV1_x0_75	0.688	0.882	0.684		0.630	2.550
MobileNetV1	0.710	0.897	0.706		1.110	4.190
MobileNetV1_ssld	0.779	0.939			1.110	4.190
MobileNetV2_x0_25	0.532	0.765			0.050	1.500
MobileNetV2_x0_5	0.650	0.857	0.654	0.864	0.170	1.930
MobileNetV2_x0_75	0.698	0.890	0.698	0.896	0.350	2.580
MobileNetV2	0.722	0.907	0.718	0.910	0.600	3.440
MobileNetV2_x1_5	0.741	0.917			1.320	6.760
MobileNetV2_x2_0	0.752	0.926			2.320	11.130
MobileNetV2_ssld	0.7674	0.9339			0.600	3.440
MobileNetV3_large_ x1_25	0.764	0.930	0.766		0.714	7.440
MobileNetV3_large_ x1_0	0.753	0.923	0.752		0.450	5.470
MobileNetV3_large_ x0_75	0.731	0.911	0.733		0.296	3.910
MobileNetV3_large_ x0_5	0.692	0.885	0.688		0.138	2.670
MobileNetV3_large_ x0_35	0.643	0.855	0.642		0.077	2.100
MobileNetV3_small_ x1_25	0.707	0.895	0.704		0.195	3.620
MobileNetV3_small_ x1_0	0.682	0.881	0.675		0.123	2.940
MobileNetV3_small_ x0_75	0.660	0.863	0.654		0.088	2.370
MobileNetV3_small_ x0_5	0.592	0.815	0.580		0.043	1.900
MobileNetV3_small_ x0_35	0.530	0.764	0.498		0.026	1.660
MobileNetV3_large_ x1_0_ssld	0.790	0.945			0.450	5.470
MobileNetV3_large_ x1_0_ssld_int8	0.761
MobileNetV3_small_ x1_0_ssld	0.713	0.901			0.123	2.940
ShuffleNetV2	0.688	0.885	0.694		0.280	2.260
ShuffleNetV2_x0_25	0.499	0.738			0.030	0.600
ShuffleNetV2_x0_33	0.537	0.771			0.040	0.640
ShuffleNetV2_x0_5	0.603	0.823	0.603		0.080	1.360
ShuffleNetV2_x1_5	0.716	0.902	0.726		0.580	3.470
ShuffleNetV2_x2_0	0.732	0.912	0.749		1.120	7.320
ShuffleNetV2_swish	0.700	0.892			0.290	2.260

2.4.3. Inference speed and storage size based on SD855¶

Models	Batch Size=1(ms)	Storage Size(M)
MobileNetV1_x0_25	3.220	1.900
MobileNetV1_x0_5	9.580	5.200
MobileNetV1_x0_75	19.436	10.000
MobileNetV1	32.523	16.000
MobileNetV1_ssld	32.523	16.000
MobileNetV2_x0_25	3.799	6.100
MobileNetV2_x0_5	8.702	7.800
MobileNetV2_x0_75	15.531	10.000
MobileNetV2	23.318	14.000
MobileNetV2_x1_5	45.624	26.000
MobileNetV2_x2_0	74.292	43.000
MobileNetV2_ssld	23.318	14.000
MobileNetV3_large_x1_25	28.218	29.000
MobileNetV3_large_x1_0	19.308	21.000
MobileNetV3_large_x0_75	13.565	16.000
MobileNetV3_large_x0_5	7.493	11.000
MobileNetV3_large_x0_35	5.137	8.600
MobileNetV3_small_x1_25	9.275	14.000
MobileNetV3_small_x1_0	6.546	12.000
MobileNetV3_small_x0_75	5.284	9.600
MobileNetV3_small_x0_5	3.352	7.800
MobileNetV3_small_x0_35	2.635	6.900
MobileNetV3_large_x1_0_ssld	19.308	21.000
MobileNetV3_large_x1_0_ssld_int8	14.395	10.000
MobileNetV3_small_x1_0_ssld	6.546	12.000
ShuffleNetV2	10.941	9.000
ShuffleNetV2_x0_25	2.329	2.700
ShuffleNetV2_x0_33	2.643	2.800
ShuffleNetV2_x0_5	4.261	5.600
ShuffleNetV2_x1_5	19.352	14.000
ShuffleNetV2_x2_0	34.770	28.000
ShuffleNetV2_swish	16.023	9.100

2.4.4. Inference speed based on T4 GPU¶

Models	FP16 Batch Size=1 (ms)	FP16 Batch Size=4 (ms)	FP16 Batch Size=8 (ms)	FP32 Batch Size=1 (ms)	FP32 Batch Size=4 (ms)	FP32 Batch Size=8 (ms)
MobileNetV1_x0_25	0.68422	1.13021	1.72095	0.67274	1.226	1.84096
MobileNetV1_x0_5	0.69326	1.09027	1.84746	0.69947	1.43045	2.39353
MobileNetV1_x0_75	0.6793	1.29524	2.15495	0.79844	1.86205	3.064
MobileNetV1	0.71942	1.45018	2.47953	0.91164	2.26871	3.90797
MobileNetV1_ssld	0.71942	1.45018	2.47953	0.91164	2.26871	3.90797
MobileNetV2_x0_25	2.85399	3.62405	4.29952	2.81989	3.52695	4.2432
MobileNetV2_x0_5	2.84258	3.1511	4.10267	2.80264	3.65284	4.31737
MobileNetV2_x0_75	2.82183	3.27622	4.98161	2.86538	3.55198	5.10678
MobileNetV2	2.78603	3.71982	6.27879	2.62398	3.54429	6.41178
MobileNetV2_x1_5	2.81852	4.87434	8.97934	2.79398	5.30149	9.30899
MobileNetV2_x2_0	3.65197	6.32329	11.644	3.29788	7.08644	12.45375
MobileNetV2_ssld	2.78603	3.71982	6.27879	2.62398	3.54429	6.41178
MobileNetV3_large_x1_25	2.34387	3.16103	4.79742	2.35117	3.44903	5.45658
MobileNetV3_large_x1_0	2.20149	3.08423	4.07779	2.04296	2.9322	4.53184
MobileNetV3_large_x0_75	2.1058	2.61426	3.61021	2.0006	2.56987	3.78005
MobileNetV3_large_x0_5	2.06934	2.77341	3.35313	2.11199	2.88172	3.19029
MobileNetV3_large_x0_35	2.14965	2.7868	3.36145	1.9041	2.62951	3.26036
MobileNetV3_small_x1_25	2.06817	2.90193	3.5245	2.02916	2.91866	3.34528
MobileNetV3_small_x1_0	1.73933	2.59478	3.40276	1.74527	2.63565	3.28124
MobileNetV3_small_x0_75	1.80617	2.64646	3.24513	1.93697	2.64285	3.32797
MobileNetV3_small_x0_5	1.95001	2.74014	3.39485	1.88406	2.99601	3.3908
MobileNetV3_small_x0_35	2.10683	2.94267	3.44254	1.94427	2.94116	3.41082
MobileNetV3_large_x1_0_ssld	2.20149	3.08423	4.07779	2.04296	2.9322	4.53184
MobileNetV3_small_x1_0_ssld	1.73933	2.59478	3.40276	1.74527	2.63565	3.28124
ShuffleNetV2	1.95064	2.15928	2.97169	1.89436	2.26339	3.17615
ShuffleNetV2_x0_25	1.43242	2.38172	2.96768	1.48698	2.29085	2.90284
ShuffleNetV2_x0_33	1.69008	2.65706	2.97373	1.75526	2.85557	3.09688
ShuffleNetV2_x0_5	1.48073	2.28174	2.85436	1.59055	2.18708	3.09141
ShuffleNetV2_x1_5	1.51054	2.4565	3.41738	1.45389	2.5203	3.99872
ShuffleNetV2_x2_0	1.95616	2.44751	4.19173	2.15654	3.18247	5.46893
ShuffleNetV2_swish	2.50213	2.92881	3.474	2.5129	2.97422	3.69357