Learning and Repair of Deep Reinforcement Learning Policies from Fuzz-Testing Data
Reinforcement learning from demonstrations (RLfD) is a promising approach to improve the exploration efficiency of reinforcement learning (RL) by learning from expert demonstrations in addition to interactions with the environment. In this paper, we propose a framework that combines techniques from search-based testing with RLfD with the goal to raise the level of dependability of RL policies and to reduce human engineering effort. Within our framework, we provide methods for efficiently training, evaluating, and repairing RL policies. Instead of relying on the costly collection of demonstrations from (human) experts, we automatically compute a diverse set of demonstrations via search-based fuzzing methods and use the fuzz demonstrations for RLfD. To evaluate the safety and robustness of the trained RL agent, we search for safety-critical scenarios in the black-box environment. Finally, when unsafe behavior is detected, we compute demonstrations through fuzz testing that represent safe behavior and use them to repair the policy. Our experiments show that our framework is able to efficiently learn high-performing and safe policies without requiring any expert knowledge.
Wed 17 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Program Repair 2Journal-first Papers / Research Track / Software Engineering in Practice at Pequeno Auditório Chair(s): Xiang Gao Beihang University | ||
14:00 15mTalk | Practical Program Repair via Preference-based Ensemble Strategy Research Track Wenkang Zhong State Key Laboratory for Novel Software and Technology, Nanjing University, 22 Hankou Road, Nanjing, China, Chuanyi Li Nanjing University, Kui Liu Huawei, Tongtong Xu Huawei, Jidong Ge Nanjing University, Tegawendé F. Bissyandé University of Luxembourg, Bin Luo Nanjing University, Vincent Ng Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75083-0688 | ||
14:15 15mTalk | Learning and Repair of Deep Reinforcement Learning Policies from Fuzz-Testing Data Research Track Martin Tappler TU Graz; Silicon Austria Labs, Andrea Pferscher Institute of Software Technology, Graz University of Technology , Bernhard Aichernig Graz University of Technology, Bettina Könighofer Graz University of Technology | ||
14:30 15mTalk | BinAug: Enhancing Binary Similarity Analysis with Low-Cost Input Repairing Research Track WONG Wai Kin Hong Kong University of Science and Technology, Huaijin Wang Hong Kong University of Science and Technology, Li Zongjie Hong Kong University of Science and Technology, Shuai Wang The Hong Kong University of Science and Technology | ||
14:45 15mTalk | Constraint Based Program Repair for Persistent Memory Bugs Research Track | ||
15:00 15mTalk | User-Centric Deployment of Automated Program Repair at Bloomberg Software Engineering in Practice David Williams University College London, James Callan UCL, Serkan Kirbas Bloomberg LP, Sergey Mechtaev University College London, Justyna Petke University College London, Thomas Prideaux-Ghee Bloomberg LP, Federica Sarro University College London | ||
15:15 7mTalk | AIBugHunter: A Practical Tool for Predicting, Classifying and Repairing Software Vulnerabilities Journal-first Papers Michael Fu Monash University, Kla Tantithamthavorn Monash University, Trung Le Monash University, Australia, Yuki Kume Monash University, Van Nguyen Monash University, Dinh Phung Monash University, Australia, John Grundy Monash University Link to publication DOI Pre-print |