Revisiting "Revisiting Neuron Coverage for DNN Testing: A Layer-Wise and Distribution-Aware Criterion": A Critical Review and Implications on DNN Coverage Testing
We present a critical review of Neural Coverage (NLC), a state-of-the-art DNN coverage criterion by Yuan et al. at ICSE 2023. While NLC proposes to satisfy eight design requirements and demonstrates strong empirical performance, we question some of their theoretical and empirical assumptions. We observe that NLC deviates from core principles of coverage criteria, such as monotonicity and test suite order independence, and could more fully account for key properties of the covariance matrix. Additionally, we note threats to the validity of the empirical study, related to the ground truth ordering of test suites. Through our empirical validation, we substantiate our claims and propose improvements for future DNN coverage metrics. Finally, we conclude by discussing the implications of these insights.
Fri 17 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
16:00 - 17:30 | Software Engineering for AI 8Research Track / New Ideas and Emerging Results (NIER) at Oceania VII Chair(s): Sheila Reinehr Pontifícia Universidade Católica do Paraná (PUCPR) | ||
16:00 15mTalk | TaskEval: Synthesised Evaluation for Foundation-Model Tasks New Ideas and Emerging Results (NIER) Dilani Widanapathiranage Applied Artificial Intelligence Initiative, Deakin University, Scott Barnett Applied Artificial Intelligence Initiative, Deakin University, Stefanus Kurniawan Deakin University, Wannita Takerngsaksiri Applied Artificial Intelligence Initiative, Deakin University | ||
16:15 15mTalk | SpecOps: A Fully Automated AI Agent Testing Framework in Real-World GUI Environments Research Track Syed Yusuf Ahmed Purdue University, Shiwei Feng Purdue University, Chanwoo Bae Purdue University, Calix Barrus University of Texas at San Antonio, Xiangyu Zhang Purdue University | ||
16:30 15mTalk | Revisiting "Revisiting Neuron Coverage for DNN Testing: A Layer-Wise and Distribution-Aware Criterion": A Critical Review and Implications on DNN Coverage Testing Research Track Jinhan Kim Università della Svizzera italiana, Nargiz Humbatova Università della Svizzera italiana, Gunel Jahangirova King's College London, Shin Yoo KAIST, Paolo Tonella USI Lugano Pre-print | ||
16:45 15mTalk | VADA: A Multicultural Benchmark for Value-Aware Data Generation and Alignment Evaluation in LLMs Research Track Zhenlun Zhang Nanjing University, Yang Feng Nanjing University, Shihao Weng Nanjing University, Yining Yin Nanjing University, Jincheng Li Nanjing University, Jia Liu Nanjing University | ||
17:00 15mTalk | Evaluating the effectiveness of LLM-based interoperability Research Track Rodrigo Falcão Fraunhofer IESE, Stefan Schweitzer Fraunhofer Institute for Experimental Software Engineering, Julien Siebert Fraunhofer IESE, Emily Calvet Fraunhofer Institute for Experimental Software Engineering, Frank Elberzhager Fraunhofer Institute for Experimental Software Engineering | ||
17:15 15mTalk | Beyond Correctness: Exposing LLM-generated Logical Flaws in Reasoning via Multi-step Automated Theorem Proving Research Track Xinyi Zheng Huazhong University of Science and Technology, Ningke Li National University of Singapore, Xiaokun Luan Peking University, Kailong Wang Huazhong University of Science and Technology, Ling Shi Nanyang Technological University, Meng Sun Peking University, Haoyu Wang Huazhong University of Science and Technology | ||