The reliability of decision-making policies is urgently important today as they have established the fundamentals of many critical applications, such as autonomous driving and robotics. To ensure reliability, there have been a number of research efforts on testing decision-making policies that solve Markov decision processes (MDPs). However, due to the deep neural network (DNN)-based inherit and infinite state space, developing scalable and effective testing frameworks for decision-making policies still remains open and challenging.
In this paper, we present an effective testing framework for decision-making policies. The framework adopts a generative diffusion model-based test case generator that can easily adapt to different search spaces, ensuring the practicality and validity of test cases. Then, we propose a termination state novelty-based guidance to diversify agent behaviors and improve the test effectiveness. Finally, we evaluate the framework on five widely used benchmarks, including autonomous driving, aircraft collision avoidance, and gaming scenarios. The results demonstrate that our approach identifies more diverse and influential failure-triggering test cases compared to current state-of-the-art techniques. Moreover, we employ the detected failure cases to repair the evaluated models, achieving better robustness enhancement compared to the baseline method.
slides (slides.pptx) | 3.69MiB |
Generative Model-Based Testing on Decision-Making Policies (ASE_2023_Generative_Model_based_Testing_on_Decision_Making_Policies.pdf) | 1.57MiB |
Tue 12 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
13:30 - 15:00 | Testing AI Systems 2NIER Track / Journal-first Papers / Research Papers at Room C Chair(s): Lwin Khin Shar Singapore Management University | ||
13:30 12mTalk | ATOM: Automated Black-Box Testing of Multi-Label Image Classification Systems Research Papers Shengyou Hu Nanjing University, Huayao Wu Nanjing University, Peng Wang Fudan University, Jing Chang Guangdong OPPO Mobile Telecommunications Corp.,Ltd., Yongjun Tu Guangdong OPPO Mobile Telecommunications Corp.,Ltd., Xiu Jiang Guangdong OPPO Mobile Telecommunications Corp.,Ltd., Xintao Niu Nanjing University, Changhai Nie Nanjing University Pre-print Media Attached File Attached | ||
13:42 12mTalk | Automating Bias Testing of LLMs NIER Track Sergio Morales Universitat Oberta de Catalunya, Robert Clarisó Universitat Oberta de Catalunya, Jordi Cabot Luxembourg Institute of Science and Technology Pre-print File Attached | ||
13:54 12mTalk | MUTEN: Mutant-Based Ensembles for Boosting Gradient-Based Adversarial Attack NIER Track Qiang Hu University of Luxembourg, Yuejun GUo Luxembourg Institute of Science and Technology, Maxime Cordy University of Luxembourg, Luxembourg, Mike Papadakis University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg File Attached | ||
14:06 12mResearch paper | Generative Model-Based Testing on Decision-Making Policies Research Papers Zhuo Li Kyushu University, Xiongfei Wu Kyushu University, Derui Zhu Technical University of Munich, Mingfei Cheng Singapore Management University, Siyuan Chen Kyushu University, Fuyuan Zhang Kyushu University, Xiaofei Xie Singapore Management University, Lei Ma University of Alberta, Jianjun Zhao Kyushu University File Attached | ||
14:18 12mTalk | Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-based Safety-critical Systems Journal-first Papers Hazem FAHMY University of Luxembourg, Fabrizio Pastore University of Luxembourg, Lionel Briand University of Luxembourg; University of Ottawa, Thomas Stifter IEE S.A. Link to publication DOI Pre-print File Attached | ||
14:30 12mTalk | Are We Ready to Embrace Generative AI for Software Q&A? NIER Track Bowen Xu North Carolina State University, Thanh-Dat Nguyen University of Melbourne, Le-Cong Thanh The University of Melbourne, Thong Hoang CSIRO's Data61, Jiakun Liu Singapore Management University, Kisub Kim Singapore Management University, Singapore, Chen GONG University of Virginia, Changan Niu Software Institute, Nanjing University, Chenyu Wang Singapore Management University, Xuan-Bach D. Le University of Melbourne, David Lo Singapore Management University |