2.6. Inception series¶

2.6.1. Overview¶

GoogLeNet is a new neural network structure designed by Google in 2014, which, together with VGG network, became the twin champions of the ImageNet challenge that year. GoogLeNet introduces the Inception structure for the first time, and stacks the Inception structure in the network so that the number of network layers reaches 22, which is also the mark of the convolutional network exceeding 20 layers for the first time. Since 1x1 convolution is used in the Inception structure to reduce the dimension of channel number, and Global pooling is used to replace the traditional method of processing features in multiple fc layers, the final GoogLeNet network has much less FLOPS and parameters than VGG network, which has become a beautiful scenery of neural network design at that time.

Xception is another improvement to InceptionV3 that Google proposed after Inception. In Xception, the author used the depthwise separable convolution to replace the traditional convolution operation, which greatly saved the network FLOPS and the number of parameters, but improved the accuracy. In DeeplabV3+, the author further improved the Xception and increased the number of Xception layers, and designed the network of Xception65 and Xception71.

InceptionV4 is a new neural network designed by Google in 2016, when residual structure were all the rage, but the authors believe that high performance can be achieved using only Inception structure. InceptionV4 uses more Inception structure to achieve even greater precision on Imagenet-1k.

The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.

../_images/t4.fp32.bs4.Inception.flops.png

../_images/t4.fp32.bs4.Inception.params.png

../_images/t4.fp32.bs4.Inception.png

../_images/t4.fp16.bs4.Inception.png

The figure above reflects the relationship between the accuracy of Xception series and InceptionV4 and other indicators. Among them, Xception_deeplab is consistent with the structure of the paper, and Xception is an improved model developed by PaddleClas, which improves the accuracy by about 0.6% when the inference speed is basically unchanged. Details of the improved model are being updated, so stay tuned.

2.6.2. Accuracy, FLOPS and Parameters¶

Models	Top1	Top5	Reference top1	Reference top5	FLOPS (G)	Parameters (M)
GoogLeNet	0.707	0.897	0.698		2.880	8.460
Xception41	0.793	0.945	0.790	0.945	16.740	22.690
Xception41 _deeplab	0.796	0.944			18.160	26.730
Xception65	0.810	0.955			25.950	35.480
Xception65 _deeplab	0.803	0.945			27.370	39.520
Xception71	0.811	0.955			31.770	37.280
InceptionV4	0.808	0.953	0.800	0.950	24.570	42.680

2.6.3. Inference speed based on V100 GPU¶

Models	Crop Size	Resize Short Size	FP32 Batch Size=1 (ms)
GoogLeNet	224	256	1.807
Xception41	299	320	3.972
Xception41_ deeplab	299	320	4.408
Xception65	299	320	6.174
Xception65_ deeplab	299	320	6.464
Xception71	299	320	6.782
InceptionV4	299	320	11.141

2.6.4. Inference speed based on T4 GPU¶

Models	Crop Size	Resize Short Size	FP16 Batch Size=1 (ms)	FP16 Batch Size=4 (ms)	FP16 Batch Size=8 (ms)	FP32 Batch Size=1 (ms)	FP32 Batch Size=4 (ms)	FP32 Batch Size=8 (ms)
GoogLeNet	299	320	1.75451	3.39931	4.71909	1.88038	4.48882	6.94035
Xception41	299	320	2.91192	7.86878	15.53685	4.96939	17.01361	32.67831
Xception41_ deeplab	299	320	2.85934	7.2075	14.01406	5.33541	17.55938	33.76232
Xception65	299	320	4.30126	11.58371	23.22213	7.26158	25.88778	53.45426
Xception65_ deeplab	299	320	4.06803	9.72694	19.477	7.60208	26.03699	54.74724
Xception71	299	320	4.80889	13.5624	27.18822	8.72457	31.55549	69.31018
InceptionV4	299	320	9.50821	13.72104	20.27447	12.99342	25.23416	43.56121