Lightweight Probabilistic Coverage Metrics for Efficient Testing of Deep Neural Networks
Deep neural networks (DNNs) have been deployed in many software systems to assist in various tasks. Accompanying with great performance, however, DNNs could also exhibit erroneous behaviors and cause massive losses. To assist the quality assurance and measure the testing adequacy of DNNs, recent research has proposed many neuron coverage (NC) metrics that measure the proportion of neurons activated in executions. While neuron coverage metrics are an analogy to structural code coverage for conventional software programs and reflect the internal behaviors of DNN models in executions, we still lack a comprehensive understanding about the application effectiveness of neuron coverage for deep learning testing. Besides, technologies like DeepGini and ATS have demonstrated the superiority of output probability vectors over neuron coverage for test selection, these techniques do not serve as coverage metrics and thus cannot be directly compared with neuron coverage in other deep learning testing tasks.
This paper systematically evaluates the effectiveness of neuron activation-based coverage in multiple testing application scenarios. In addition, to better understand neuron coverage bottlenecks, we further propose an output-probability vector-based coverage metric (named \pt) inspired by existing test selection technique. We perform a comprehensive experiments across three prevalent application scenarios: assessing dataset diversity, improving model retraining, and guiding test generation. Experimental results show that most neuron coverage techniques are not very effective in deep learning testing. Coverage based on neuron activation state do not improve testing efficiency like code coverage. In contrast, the output-based coverage we introduced demonstrates significantly enhanced effectiveness. Our study improves the comprehension of neuron coverage metrics and provides an important viewpoint for coverage-based testing in the field of deep learning.
Sat 21 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
11:00 - 13:00 | Session7: AI for Software Engineering IIIResearch Track at Cosmos 3C Chair(s): Lina Gong Nanjing University of Aeronautics and Astronautic | ||
11:00 15mTalk | Brevity is the Soul of Wit: Condensing Code Changes to Improve Commit Message Generation Research Track Hongyu Kuang Nanjing University, Ning Zhang Nanjing University, Hui Gao Nanjing University, Xin Zhou Nanjing University, Wesley Assunção North Carolina State University, Xiaoxing Ma Nanjing University, Dong Shao Nanjing University, Guoping Rong Nanjing University, He Zhang Nanjing University | ||
11:15 15mTalk | DesDD: A Design-Enabled Framework with Dual-Layer Debugging for LLM-based Iterative API Orchestrating Research Track Zhuo Cheng Jiangxi normal University, Zhou Zou Jiangxi Normal University, Qing Huang School of Computer Information Engineering, Jiangxi Normal University, Zhenchang Xing CSIRO's Data61, Wei Zhang Jiangxi Meteorological Disaster Emergency Early Warning Center, Jiangxi Meteorological Bureau, Shaochen Wang Jiangxi Normal Univesity, Xueting Yi Jiangxi Meteorological Disaster Emergency Early Warning Center, Jiangxi Meteorological Bureau, Huan Jin School of Information Engineering, Jiangxi University of Technology, Zhiping Liu College of Information Engineering, Gandong University, Zhaojin Lu Jiangxi Tellhow Animation College, Tellhow Group Co.,LTD | ||
11:30 15mTalk | AUCAD: Automated Construction of Alignment Dataset from Log-Related Issues for Enhancing LLM-based Log Generation Research Track Hao Zhang Nanjing University, Dongjun Yu Nanjing University, Lei Zhang Nanjing University, Guoping Rong Nanjing University, YongdaYu Nanjing University, Haifeng Shen Southern Cross University, He Zhang Nanjing University, Dong Shao Nanjing University, Hongyu Kuang Nanjing University | ||
11:45 15mTalk | Enhancement Report Approval Prediction: A Comparative Study of Large Language Models Research Track | ||
12:00 15mTalk | MetaCoder: Generating Code from Multiple Perspectives Research Track chen xin , Zhijie Jiang National University of Defense Technology, Yong Guo National University of Defense Technology, Zhouyang Jia National University of Defense Technology, Si Zheng National University of Defense Technology, Yuanliang Zhang National University of Defense Technology, Shanshan Li National University of Defense Technology | ||
12:15 15mTalk | API-Repo: API-centric Repository-level Code Completion Research Track Zhihao Li State Key Laboratory for Novel Software and Technology, Nanjing University, Chuanyi Li Nanjing University, Changan Niu Software Institute, Nanjing University, Ying Yan State Key Laboratory for Novel Software and Technology, Nanjing University, Jidong Ge Nanjing University, Bin Luo Nanjing University | ||
12:30 15mTalk | AdaptiveLLM: A Framework for Selecting Optimal Cost-Efficient LLM for Code-Generation Based on CoT Length Research Track Junhang Cheng Beihang University, Fang Liu Beihang University, Chengru Wu Beihang University, Li Zhang Beihang University Pre-print Media Attached File Attached | ||
12:45 15mTalk | Lightweight Probabilistic Coverage Metrics for Efficient Testing of Deep Neural Networks Research Track Yining Yin Nanjing University, Yang Feng Nanjing University, Shihao Weng Nanjing University, Xinyu Gao , Jia Liu Nanjing University, Zhihong Zhao Nanjing University |
Cosmos 3C is the third room in the Cosmos 3 wing.
When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.