HybridRepair: Towards Annotation-Efficient Repair for Deep Learning Models
Thu 21 Jul 2022 16:40 - 17:00 at ISSTA 2 - Session 3-6: Neural Networks, Learning, NLP F
A well-trained deep learning (DL) model often cannot achieve expected performance after deployment due to the mismatch between the distributions of the training data and the field data in the operational environment. Therefore, repairing DL models is critical, especially when deployed on increasingly larger tasks with shifted distributions.
Generally speaking, it is easy to obtain a large amount of field data. Existing solutions develop various techniques to select a subset for annotation and then fine-tune the model for repair. While effective, achieving a higher repair rate is inevitably associated with more expensive labeling costs. To mitigate this problem, we propose a novel annotation-efficient repair solution for DL models, namely \emph{HybridRepair}, wherein we take a holistic approach that coordinates the use of a small amount of annotated data and a large amount of unlabeled data for repair. Our key insight is that \emph{accurate yet sufficient} training data is needed to repair the corresponding failure region in the data distribution. Under a given labeling budget, we selectively annotate some data in each failure region and propagate their labels to the neighboring data on the one hand. On the other hand, we take advantage of the semi-supervised learning (SSL) techniques to further boost the training data density. However, different from existing SSL solutions that try to use all the unlabeled data, we only use a selected part of them considering the impact of distribution shift on SSL solutions. Experimental results show that \emph{HybridRepair} outperforms both state-of-the-art DL model repair solutions and semi-supervised techniques for model improvements, especially when there is a distribution shift between the training data and the field data. Our code is available at: \url{https://doi.org/10.5281/zenodo.5914559}.
Wed 20 JulDisplayed time zone: Seoul change
07:00 - 08:20 | |||
07:00 20mTalk | Cross-Lingual Transfer Learning for Statistical Type InferenceACM SIGSOFT Distinguished Paper Technical Papers Zhiming Li Nanyang Technological University, Singapore, Xiaofei Xie Singapore Management University, Singapore, Haoliang Li City University of Hong Kong, Zhengzi Xu Nanyang Technological University, Yi Li Nanyang Technological University, Yang Liu Nanyang Technological University DOI | ||
07:20 20mTalk | DocTer: Documentation-Guided Fuzzing for Testing Deep Learning API Functions Technical Papers Danning Xie Purdue University, Yitong Li University of Waterloo, Mijung Kim UNIST, Hung Viet Pham University of Waterloo, Lin Tan Purdue University, Xiangyu Zhang Purdue University, Michael W. Godfrey University of Waterloo, Canada DOI | ||
07:40 20mTalk | HybridRepair: Towards Annotation-Efficient Repair for Deep Learning Models Technical Papers DOI | ||
08:00 20mTalk | Human-in-the-Loop Oracle Learning for Semantic Bugs in String Processing Programs Technical Papers Charaka Geethal Monash University, Thuan Pham The University of Melbourne, Aldeida Aleti Monash University, Marcel Böhme MPI-SP, Germany and Monash University, Australia DOI Pre-print |
Thu 21 JulDisplayed time zone: Seoul change
16:20 - 17:40 | |||
16:20 20mTalk | AEON: A Method for Automatic Evaluation of NLP Test Cases Technical Papers Jen-tse Huang The Chinese University of Hong Kong, Jianping Zhang The Chinese University of Hong Kong, Wenxuan Wang The Chinese University of Hong Kong, Pinjia He The Chinese University of Hong Kong, Shenzhen, Yuxin Su Sun Yat-sen University, Michael Lyu The Chinese University of Hong Kong DOI | ||
16:40 20mTalk | HybridRepair: Towards Annotation-Efficient Repair for Deep Learning Models Technical Papers DOI | ||
17:00 20mTalk | Improving Cross-Platform Binary Analysis using Representation Learning via Graph Alignment Technical Papers Geunwoo Kim University of California, Irvine, USA, Sanghyun Hong Oregon State University, Michael Franz University of California, Irvine, Dokyung Song Yonsei University, South Korea DOI | ||
17:20 20mTalk | Human-in-the-Loop Oracle Learning for Semantic Bugs in String Processing Programs Technical Papers Charaka Geethal Monash University, Thuan Pham The University of Melbourne, Aldeida Aleti Monash University, Marcel Böhme MPI-SP, Germany and Monash University, Australia DOI Pre-print |