Audio classification systems, powered by deep neural networks (DNNs), are integral to various applications that impact daily lives, like voice-activated assistants. Ensuring the accuracy of these systems is crucial since inaccuracies can lead to significant security issues and user mistrust. However, testing audio classifiers presents a significant challenge: the high manual labeling cost for annotating audio test inputs. Test input prioritization has emerged as a promising approach to mitigate this labeling cost issue. It prioritizes potentially misclassified tests, allowing for the early labeling of such critical inputs and making debugging more efficient. However, when applying existing test prioritization methods to audio-type test inputs, there are some limitations: 1) Coverage-based methods are less effective and efficient than confidence-based methods. 2) Confidence-based methods rely only on prediction probability vectors, ignoring the unique characteristics of audio-type data. 3) Mutation-based methods lack designed mutation operations for audio data, making them unsuitable for audio-type test inputs. To overcome these challenges, we propose AudioTest, a novel test prioritization approach specifically designed for audio-type test inputs. The core premise is that tests closer to misclassified samples are more likely to be misclassified. Based on the special characteristics of audio-type data, AudioTest generates four types of features: time-domain features, frequency-domain features, perceptual features, and output features. For each test, AudioTest concatenates its four types of features into a feature vector and applies a carefully designed feature transformation strategy to bring misclassified tests closer in space. AudioTest leverages a trained model to predict the probability of misclassification of each test based on its transformed vectors and ranks all the tests accordingly. We evaluate the performance of AudioTest utilizing 96 subjects, encompassing natural and noisy datasets. We employed two classical metrics, Percentage of Fault Detection (PFD) and Average Percentage of Fault Detected (APFD), for our evaluation. The results demonstrate that AudioTest outperforms all the compared test prioritization approaches in terms of both PFD and APFD. The average improvement of AudioTest compared to the baseline test prioritization methods ranges from 12.63% to 54.58% on natural datasets and from 12.71% to 40.48% on noisy datasets.
Fri 27 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:30 | AI TestingResearch Papers / Tool Demonstrations at Cosmos 3A Chair(s): Cuiyun Gao Harbin Institute of Technology | ||
14:00 25mTalk | AudioTest: Prioritizing Audio Test Cases Research Papers Yinghua Li University of Luxembourg, Xueqi Dang University of Luxembourg, SnT, Wendkuuni Arzouma Marc Christian OUEDRAOGO University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg DOI Media Attached | ||
14:25 25mTalk | S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models Research Papers Xiaohan Yuan Zhejiang University, Jinfeng Li Alibaba Group, Dongxia Wang Zhejiang University, Yuefeng Chen Alibaba Group, Xiaofeng Mao Alibaba Group, Longtao Huang Alibaba Group, Jialuo Chen Zhejiang University, Hui Xue Alibaba Group, Xiaoxia Liu Zhejiang University, Wenhai Wang Zhejiang University, Kui Ren Zhejiang University, Jingyi Wang Zhejiang University DOI | ||
14:50 25mTalk | Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing Research Papers Yanzhou Mu , Juan Zhai University of Massachusetts at Amherst, Chunrong Fang Nanjing University, Xiang Chen Nantong University, Zhixiang Cao Xi'an Jiaotong University, Peiran Yang Nanjing University, Kexin Zhao Nanjing University, An Guo Nanjing University, Zhenyu Chen Nanjing University DOI | ||
15:15 15mDemonstration | ASTRAL: A Tool for the Automated Safety Testing of Large Language Models Tool Demonstrations Miriam Ugarte Mondragon University, Pablo Valle Mondragon University, José Antonio Parejo Maestre Universidad de Sevilla, Sergio Segura SCORE Lab, I3US Institute, Universidad de Sevilla, Seville, Spain, Aitor Arrieta Mondragon University |
Cosmos 3A is the first room in the Cosmos 3 wing.
When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.