Mar 11, 2022, 11:30am EC4-2101A
Deep convolutional neural network (CNN) algorithms have emerged as a powerful tool for many computer vision tasks. However, these algorithms are computationally expensive and difficult to adapt for resource constrained environments. With the proliferation of CNNs for mobile, there is a growing need for methods to reduce their latency and power consumption. Furthermore, we would like a principled approach to the design and understanding of CNN model behaviour. CNN quantization has become an essential tool for speeding up CNN inference speed and reducing their power consumption. However, it can be difficult to understand different sources of quantization error/noise and why a given model’s quantized output deviates from its floating point behaviour. This work describes a novel framework for the fine-grained analysis of the quantized behaviour of efficient CNN architectures and subsequent leveraging of those insights for quantization-aware design of CNN models. We demonstrate the use of this framework in two contexts of quantization-aware model design and discuss how insights gained could be further leveraged for making more quantization-friendly design choices.