Boosting Code-line-level Defect Prediction with Spectrum Information and Causality Analysis (ICSE 2025 - Research Track)

Who

Shiyu Sun, Yanhui Li, Lin Chen, Yuming Zhou, Jianhua Zhao

Track

ICSE 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 2 May 2025 11:15 - 11:30 at 204 - Program Comprehension 3 Chair(s): Arie van Deursen

Abstract

Code-line-level defect prediction (CLDP) is an effective technique to incorporate comprehensive measures for buggy line identification to optimize efforts in Software Quality Assurance activities. Most CLDP methods either consider the textual information of the code or rely merely on file-level label information, which have not fully leveraged the essential information in the CLDP context, with historical \textit{code-line-level labels} being incredibly overlooked in their application. Due to the vast number of code lines and the sparsity of the tokens they contain, leveraging historical code-line-level label information remains a significant challenge.

To address this issue, we propose a novel CLDP method, \textbf{S}pectrum inf\textbf{O}rmation and ca\textbf{U}sality a\textbf{N}alysis based co\textbf{D}e-line-level defect prediction ($\mathsf{SOUND}$). $\mathsf{SOUND}$ incorporates two key ideas: (a) it introduces a spectrum information perspective, utilizing labels from historical defective lines to quantify the contribution of tokens to line-level defects, and (b) it applies causal analysis to obtain a more systematic and comprehensive understanding of the causal relationships between tokens and defects. After conducting a comprehensive study involving 142 releases across 19 software projects, the experimental results show that our method significantly outperforms existing state-of-the-art (SOTA) CLDP baseline methods in terms of its ability to rank defective lines under three indicators, IFA, Recall@Top20%LOC, and Effort@Top20%Recall. Notably, in terms of IFA, our method achieves a score of 0 in most cases, indicating that the first line in the ranking list generated by our method is actually defective, significantly enhancing its practicality.

Shiyu Sun

China

Yanhui Li

Nanjing University

China

Lin Chen

Nanjing University

China

Yuming Zhou

Nanjing University

China

Jianhua Zhao

Nanjing University, China

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Program Comprehension 3Research Track / Journal-first Papers at 204 Chair(s): Arie van Deursen TU Delft

11:00 15m Talk		Automated Test Generation For Smart Contracts via On-Chain Test Case Augmentation and MigrationBlockchain Research Track Jiashuo Zhang Peking University, China, Jiachi Chen Sun Yat-sen University, John Grundy Monash University, Jianbo Gao Peking University, Yanlin Wang Sun Yat-sen University, Ting Chen University of Electronic Science and Technology of China, Zhi Guan Peking University, Zhong Chen Pre-print
11:15 15m Talk		Boosting Code-line-level Defect Prediction with Spectrum Information and Causality Analysis Research Track Shiyu Sun , Yanhui Li Nanjing University, Lin Chen Nanjing University, Yuming Zhou Nanjing University, Jianhua Zhao Nanjing University, China
11:30 15m Talk		BatFix: Repairing language model-based transpilation Journal-first Papers Daniel Ramos Carnegie Mellon University, Ines Lynce INESC-ID/IST, Universidade de Lisboa, Vasco Manquinho INESC-ID; Universidade de Lisboa, Ruben Martins Carnegie Mellon University, Claire Le Goues Carnegie Mellon University
11:45 15m Talk		Tracking the Evolution of Static Code Warnings: The State-of-the-Art and a Better Approach Journal-first Papers Junjie Li , Jinqiu Yang Concordia University
12:00 15m Talk		PACE: A Program Analysis Framework for Continuous Performance Prediction Journal-first Papers Chidera Biringa University of Massachusetts, Gokhan Kul University of Massachusetts Dartmouth
12:15 15m Talk		Mimicking Production Behavior With Generated Mocks Journal-first Papers Deepika Tiwari KTH Royal Institute of Technology, Martin Monperrus KTH Royal Institute of Technology, Benoit Baudry Université de Montréal