This paper studies the optimization of deep neural network(DNN) quantizations for noisy inputs. Existing compression techniques often generate DNNs that are sensitive to external errors. Because embedded devices are exposed to external errors such as dusts and fogs, DNNs running on those devices need to be robust to such errors. For robust quantization ofDNNs, we formulate an optimization problem that finds the bit width for each DNN layer that minimizes the robustness loss. To efficiently find the solution, we design a dynamic programming based algorithm, called QED. Exploiting optimal substructures and overlapping subproblems,QED cuts down unnecessary computation in optimal bit widths search. We also propose an incremental algorithm, Q* that quickly finds a reasonably robust quantization and then gradually improves the solution. We have extensively evaluatedQEDandQwiththree DNN models (LeNet, AlexNet, and VGG-16) and with Gaussian random errors and realistic errors simulated by DeepXplore and Automold. For comparison, we also evaluate universal quantization that uses equal bit width for all layers and DeepCompression, a weight-sharing based DNN compression technique. When tested with increasing size of errors,QED most robustly gives correct inference output. Even when a DNNinstance is optimized for robustness, we show that its quantizations may not be robust except whenQEDis used. In our case study, the decision boundary of QED is close to that of original DNN, while the boundaries of universal quantization and Deep Compression are skewed; i.e., for some input, very small errors result in incorrect inference output. Furthermore, we evaluate Q for its trade off in execution time and robustness. In one tenth ofQED’s execution time, Q* gives a quantization that is 98% as robust as the one by QED.
Sat 22 FebDisplayed time zone: Pacific Time (US & Canada) change
13:00 - 14:30 | Session 2 Techniques for Specific DomainsMain Conference Chair(s): Dongyoon Lee Stony Brook University | ||
13:00 22mResearch paper | Generating Fast Sparse Matrix Vector Multiplication From a High Level Generic Functional IR Main Conference Federico Pizzuti University of Edinburgh, Michel Steuwer University of Glasgow, Christophe Dubach University of Edinburgh | ||
13:22 22mResearch paper | A Study of Event Frequency Profiling with Differential Privacy Main Conference Hailong Zhang Ohio State University, Yu Hao , Sufian Latif Ohio State University, USA, Raef Bassily Ohio State University, USA, Atanas Rountev Ohio State University | ||
13:45 22mResearch paper | Improving Database Query Performance with Automatic Fusion Main Conference Hanfeng Chen McGill University, Canada, Alexander Krolik McGill University, Canada, Bettina Kemme McGill University, Canada, Clark Verbrugge McGill University, Canada, Laurie Hendren McGill University, Canada | ||
14:07 22mResearch paper | Robust Quantization of Deep Neural Networks Main Conference Youngseok Kim Hanyang University, Korea, Junyeol Lee Hanyang University, Korea, Younghoon Kim Hanyang University, Korea, Jiwon Seo Hanyang University |