HybridRepair: Towards Annotation-Efficient Repair for Deep Learning Models (ISSTA 2022 - Technical Papers)

Who

Yu Li, Muxi Chen, Xu, Qiang

Track

ISSTA 2022 Technical Papers

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 20 Jul 2022 07:40 - 08:00 at ISSTA 2 - Session 2-2: Neural Networks, Learning, NLP E
Thu 21 Jul 2022 16:40 - 17:00 at ISSTA 2 - Session 3-6: Neural Networks, Learning, NLP F

Abstract

A well-trained deep learning (DL) model often cannot achieve expected performance after deployment due to the mismatch between the distributions of the training data and the field data in the operational environment. Therefore, repairing DL models is critical, especially when deployed on increasingly larger tasks with shifted distributions.

Generally speaking, it is easy to obtain a large amount of field data. Existing solutions develop various techniques to select a subset for annotation and then fine-tune the model for repair. While effective, achieving a higher repair rate is inevitably associated with more expensive labeling costs. To mitigate this problem, we propose a novel annotation-efficient repair solution for DL models, namely \emph{HybridRepair}, wherein we take a holistic approach that coordinates the use of a small amount of annotated data and a large amount of unlabeled data for repair. Our key insight is that \emph{accurate yet sufficient} training data is needed to repair the corresponding failure region in the data distribution. Under a given labeling budget, we selectively annotate some data in each failure region and propagate their labels to the neighboring data on the one hand. On the other hand, we take advantage of the semi-supervised learning (SSL) techniques to further boost the training data density. However, different from existing SSL solutions that try to use all the unlabeled data, we only use a selected part of them considering the impact of distribution shift on SSL solutions. Experimental results show that \emph{HybridRepair} outperforms both state-of-the-art DL model repair solutions and semi-supervised techniques for model improvements, especially when there is a distribution shift between the training data and the field data. Our code is available at: \url{https://doi.org/10.5281/zenodo.5914559}.

DOI

https://doi.org/10.1145/3533767.3534408

Yu Li

The Chinese University of Hong Kong

Muxi Chen

The Chinese University of Hong Kong

Xu, Qiang