5.3. Model Quantifization

Int8 quantization is one of the key features in PaddleSlim. It supports two kinds of training aware, Dynamic strategy and Static strategy, layer-wise and channel-wise quantization, and using PaddleLite to deploy models generated by PaddleSlim.

By using this toolkit, PaddleClas quantized the mobilenet_v3_large_x1_0 model whose accuracy is 78.9% after distilled. After quantized, the prediction speed is accelerated from 19.308ms to 14.395ms on SD855. The storage size is reduced from 21M to 10M. The top1 recognition accuracy rate is 75.9%. For specific training methods, please refer to PaddleSlim quant aware