2.10. Other networks¶

2.10.1. Overview¶

In 2012, AlexNet network proposed by Alex et al. won the ImageNet competition by far surpassing the second place, and the convolutional neural network and even deep learning attracted wide attention. AlexNet used relu as the activation function of CNN to solve the gradient dispersion problem of sigmoid when the network is deep. During the training, Dropout was used to randomly lose a part of the neurons, avoiding the overfitting of the model. In the network, overlapping maximum pooling is used to replace the average pooling commonly used in CNN, which avoids the fuzzy effect of average pooling and improves the feature richness. In a sense, AlexNet has exploded the research and application of neural networks.

SqueezeNet achieved the same precision as AlexNet on Imagenet-1k, but only with 1/50 parameters. The core of the network is the Fire module, which used the convolution of 1x1 to achieve channel dimensionality reduction, thus greatly saving the number of parameters. The author created SqueezeNet by stacking a large number of Fire modules.

VGG is a convolutional neural network developed by researchers at Oxford University’s Visual Geometry Group and DeepMind. The network explores the relationship between the depth of the convolutional neural network and its performance. By repeatedly stacking the small convolutional kernel of 3x3 and the maximum pooling layer of 2x2, the multi-layer convolutional neural network is successfully constructed and has achieved good convergence accuracy. In the end, VGG won the runner-up of ILSVRC 2014 classification and the champion of positioning.

DarkNet53 is designed for object detection by YOLO author in the paper. The network is basically composed of 1x1 and 3x3 kernel, with a total of 53 layers, named DarkNet53.

2.10.2. Accuracy, FLOPS and Parameters¶

Models	Top1	Top5	Reference top1	Reference top5	FLOPS (G)	Parameters (M)
AlexNet	0.567	0.792	0.5720		1.370	61.090
SqueezeNet1_0	0.596	0.817	0.575		1.550	1.240
SqueezeNet1_1	0.601	0.819			0.690	1.230
VGG11	0.693	0.891			15.090	132.850
VGG13	0.700	0.894			22.480	133.030
VGG16	0.720	0.907	0.715	0.901	30.810	138.340
VGG19	0.726	0.909			39.130	143.650
DarkNet53	0.780	0.941	0.772	0.938	18.580	41.600
ResNet50_ACNet	0.767	0.932			10.730	33.110
ResNet50_ACNet _deploy	0.767	0.932			8.190	25.550

2.10.3. Inference speed based on V100 GPU¶

Models	Crop Size	Resize Short Size	FP32 Batch Size=1 (ms)
AlexNet	224	256	1.176
SqueezeNet1_0	224	256	0.860
SqueezeNet1_1	224	256	0.763
VGG11	224	256	1.867
VGG13	224	256	2.148
VGG16	224	256	2.616
VGG19	224	256	3.076
DarkNet53	256	256	3.139
ResNet50_ACNet _deploy	224	256	5.626

2.10.4. Inference speed based on T4 GPU¶

Models	Crop Size	Resize Short Size	FP16 Batch Size=1 (ms)	FP16 Batch Size=4 (ms)	FP16 Batch Size=8 (ms)	FP32 Batch Size=1 (ms)	FP32 Batch Size=4 (ms)	FP32 Batch Size=8 (ms)
AlexNet	224	256	1.06447	1.70435	2.38402	1.44993	2.46696	3.72085
SqueezeNet1_0	224	256	0.97162	2.06719	3.67499	0.96736	2.53221	4.54047
SqueezeNet1_1	224	256	0.81378	1.62919	2.68044	0.76032	1.877	3.15298
VGG11	224	256	2.24408	4.67794	7.6568	3.90412	9.51147	17.14168
VGG13	224	256	2.58589	5.82708	10.03591	4.64684	12.61558	23.70015
VGG16	224	256	3.13237	7.19257	12.50913	5.61769	16.40064	32.03939
VGG19	224	256	3.69987	8.59168	15.07866	6.65221	20.4334	41.55902
DarkNet53	256	256	3.18101	5.88419	10.14964	4.10829	12.1714	22.15266
ResNet50_ACNet	256	256	3.89002	4.58195	9.01095	5.33395	10.96843	18.70368
ResNet50_ACNet_deploy	224	256	2.6823	5.944	7.16655	3.49161	7.78374	13.94361