A Universal Data Augmentation Approach for Fault Localization (ICSE 2022 - Technical Track)

Write a Blog >>

Sun 8 - Fri 27 May 2022

Who

Huan Xie, Yan Lei, Meng Yan, Yue Yu, Xin Xia, Xiaoguang Mao

Track

ICSE 2022 Technical Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 11 May 2022 20:20 - 20:25 at ICSE room 1 - Machine Learning with and for SE 7 Chair(s): Lei Ma
Thu 12 May 2022 04:25 - 04:30 at ICSE room 1 - Machine Learning with and for SE 3 Chair(s): Antinisca Di Marco

Abstract

Data is the fuel to models, and it is still applicable in fault localization (FL). Many existing elaborate FL techniques take the code coverage matrix and failure vector as inputs, expecting the techniques could find the correlation between program entities and failures. However, the input data is high-dimensional and extremely imbalanced since the real-world programs are large in size and the number of failing test cases is much less than that of passing test cases, which are posing severe threats to the effectiveness of FL techniques.

To overcome the limitations, we propose Aeneas, a universal data augmentation approach that gener\textbf{\underline{A}}t\textbf{\underline{e}}s sy\textbf{\underline{n}}thesized failing t\textbf{\underline{e}}st cases from reduced fe\textbf{\underline{a}}ture \textbf{\underline{s}}pace for more precise fault localization. Specifically, to improve the effectiveness of data augmentation, Aeneas applies a revised principal component analysis (PCA) first to generate reduced feature space for more concise representation of the original coverage matrix, which could also gain efficiency for data synthesis. Then, Aeneas handles the imbalanced data issue through generating synthesized failing test cases from the reduced feature space through conditional variational autoencoder (CVAE). To evaluate the effectiveness of Aeneas, we conduct large-scale experiments on 458 versions of 10 programs (from ManyBugs, SIR, and Defects4J) by six state-of-the-art FL techniques. The experimental results clearly show that Aeneas is statistically more effective than baselines, e.g., our approach can improve the six original methods by 89% on average under the Top-1 accuracy.

Link to Preprint

https://github.com/ICSE2022FL/ICSE2022FLCode/blob/master/A%20Universal%20Data%20Augmentation%20Approach%20for%20Fault%20Localization.pdf

DOI

https://doi.org/10.1145/3510003.3510136

Huan Xie

Chongqing University

Yan Lei

School of Big Data & Software Engineering, Chongqing University

Meng Yan

Chongqing University

Yue Yu

College of Computer, National University of Defense Technology, Changsha 410073, China

China

Xin Xia

Huawei Software Engineering Application Technology Lab

China

Xiaoguang Mao

National University of Defense Technology

China

A Universal Data Augmentation Approach for Fault Localization

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 11 May
Displayed time zone: Eastern Time (US & Canada) change

20:00 - 21:00	Machine Learning with and for SE 7SEIP - Software Engineering in Practice / Technical Track / Journal-First Papers at ICSE room 1 Chair(s): Lei Ma University of Alberta

5m Talk		Journal First: On the Value of Oversampling for Deep Learning in Software Defect Prediction Journal-First Papers Rahul Yedida North Carolina State University, Tim Menzies North Carolina State University Media Attached
5m Talk		In-IDE Code Generation from Natural Language: Promise and Challenges Journal-First Papers Frank Xu Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Graham Neubig Carnegie Mellon University
5m Talk		Dependency Tracking for Risk Mitigation in Machine Learning (ML) Systems SEIP - Software Engineering in Practice Xiwei (Sherry) Xu CSIRO Data61, Chen Wang CSIRO DATA61, Zhen Wang CSIRO Data61, Qinghua Lu CSIRO’s Data61, Liming Zhu CSIRO’s Data61; UNSW Media Attached
5m Talk		Strategies for Reuse and Sharing among Data Scientists in Software Teams SEIP - Software Engineering in Practice Will Epperson Carnegie Mellon University, April Wang University of Michigan, Robert DeLine Microsoft Research, Steven M. Drucker Microsoft Research Pre-print Media Attached
5m Talk		A Universal Data Augmentation Approach for Fault Localization Technical Track Huan Xie Chongqing University, Yan Lei School of Big Data & Software Engineering, Chongqing University, Meng Yan Chongqing University, Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China, Xin Xia Huawei Software Engineering Application Technology Lab, Xiaoguang Mao National University of Defense Technology DOI Pre-print Media Attached
5m Talk		Explanation-Guided Fairness Testing through Genetic Algorithm Technical Track Ming Fan Xi'an Jiaotong University, Wenying Wei Xi'an Jiaotong University, Wuxia Jin Xi'an Jiaotong University, Zijiang Yang Western Michigan University, Ting Liu Xi'an Jiaotong University DOI Pre-print

Thu 12 May
Displayed time zone: Eastern Time (US & Canada) change

04:00 - 05:00	Machine Learning with and for SE 3Technical Track / Journal-First Papers / SEIP - Software Engineering in Practice at ICSE room 1 Chair(s): Antinisca Di Marco University of L'Aquila

5m Talk		In-IDE Code Generation from Natural Language: Promise and Challenges Journal-First Papers Frank Xu Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Graham Neubig Carnegie Mellon University
5m Talk		Active Learning of Discriminative Subgraph Patterns for API Misuse Detection Journal-First Papers Hong Jin Kang Singapore Management University, David Lo Singapore Management University Pre-print Media Attached File Attached
5m Talk		Dependency Tracking for Risk Mitigation in Machine Learning (ML) Systems SEIP - Software Engineering in Practice Xiwei (Sherry) Xu CSIRO Data61, Chen Wang CSIRO DATA61, Zhen Wang CSIRO Data61, Qinghua Lu CSIRO’s Data61, Liming Zhu CSIRO’s Data61; UNSW Media Attached
5m Talk		DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs Technical Track Jialun Cao Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Meiziniu LI Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Xiao Chen Huazhong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Yongqiang Tian The Hong Kong University of Science and Technology; University of Waterloo, Bo Wu MIT-IBM Watson AI Lab in Cambridge, Shing-Chi Cheung Hong Kong University of Science and Technology DOI Pre-print Media Attached
5m Talk		What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code Technical Track Yao Wan Huazhong University of Science and Technology, Wei Zhao Huazhong University of Science and Technology, Hongyu Zhang University of Newcastle, Yulei Sui University of Technology Sydney, Guandong Xu University of Technology, Sydney, Hai Jin Huazhong University of Science and Technology Pre-print Media Attached
5m Talk		A Universal Data Augmentation Approach for Fault Localization Technical Track Huan Xie Chongqing University, Yan Lei School of Big Data & Software Engineering, Chongqing University, Meng Yan Chongqing University, Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China, Xin Xia Huawei Software Engineering Application Technology Lab, Xiaoguang Mao National University of Defense Technology DOI Pre-print Media Attached
5m Talk		DeepState: Selecting Test Suites to Enhance the Robustness of Recurrent Neural Networks Technical Track Zixi Liu Nanjing University, Yang Feng Nanjing University, Yining Yin Nanjing University, China, Zhenyu Chen Nanjing University DOI Pre-print Media Attached

Information for Participants

Info for room ICSE room 1-even hours:

Click here to go to the room on Midspace