The Forgotten Role of Search Queries in IR-based Bug Localization: An Empirical Study (ICSE 2022 - Journal-First Papers)

Write a Blog >>

Sun 8 - Fri 27 May 2022

Who

Masud Rahman, Foutse Khomh, Shamima Yeasmin, Chanchal K. Roy

Track

ICSE 2022 Journal-First Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 10 May 2022 21:00 - 21:05 at ICSE room 1 - Faults and Services Chair(s): Anand Ashok Sawant
Fri 13 May 2022 04:00 - 04:05 at ICSE room 2 - Fault Localization Chair(s): Arosha K Bandara

Abstract

Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports. However, these bug reports indeed contain high-quality search keywords in their texts even though they might not contain explicit hints for localizing bugs (e.g., stack traces). We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics (e.g., frequency, entropy, position, part of speech). Such an analysis has led us to four actionable insights on how to choose appropriate keywords from a bug report. Furthermore, we demonstrate 27%–34% improvement in the performance of non-optimal queries through the application of our actionable insights to them. Finally, we summarize our study findings with future research directions (e.g., machine intelligence in keyword selection).

Link to Publication

https://link.springer.com/article/10.1007/s10664-021-10022-4

Link to Preprint

https://web.cs.dal.ca/~masud/papers/masud-EMSE2021.pdf

DOI

https://doi.org/10.1007/s10664-021-10022-4

Masud Rahman

Dalhousie University

Canada

Foutse Khomh

Polytechnique Montréal

Canada

Shamima Yeasmin

University of Saskatchewan

Chanchal K. Roy

University of Saskatchewan

Canada

Slide deck

YouTube Video

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 10 May
Displayed time zone: Eastern Time (US & Canada) change

21:00 - 22:00	Faults and ServicesSEIS - Software Engineering in Society / Technical Track / Journal-First Papers at ICSE room 1 Chair(s): Anand Ashok Sawant University of California, Davis

5m Talk		The Forgotten Role of Search Queries in IR-based Bug Localization: An Empirical Study Journal-First Papers Masud Rahman Dalhousie University, Foutse Khomh Polytechnique Montréal, Shamima Yeasmin University of Saskatchewan, Chanchal K. Roy University of Saskatchewan Link to publication DOI Pre-print Media Attached
5m Talk		Software Engineers’ Response to Public Crisis: Lessons Learnt from Spontaneously Building an Informative COVID-19 Dashboard SEIS - Software Engineering in Society Han Wang Monash University, Chao Wu Monash University, Chunyang Chen Monash University, Burak Turhan University of Oulu, Shiping Chen Data61 at CSIRO, Australia / UNSW, Australia, Jon Whittle CSIRO's Data61 and Monash University Pre-print Media Attached
5m Talk		Fault Localization via Efficient Probabilistic Modeling of Program Semantics Technical Track Muhan Zeng Peking University, Yiqian Wu Peking University, Zhentao Ye Peking University, Yingfei Xiong Peking University, Xin Zhang Peking University, Lu Zhang Peking University DOI Pre-print Media Attached
5m Talk		Adaptive Performance Anomaly Detection for Online Service Systems via Pattern Sketching Technical Track Zhuangbin Chen Chinese University of Hong Kong, China, Jinyang Liu , Yuxin Su Sun Yat-sen University, Hongyu Zhang University of Newcastle, Xiao Ling Huawei Technologies, Yongqiang Yang Huawei Technologies, Michael Lyu The Chinese University of Hong Kong Pre-print Media Attached
5m Talk		Eflect: Porting Energy-Aware Applications to Shared Environments Technical Track Timur Babakol SUNY Binghamton, USA, Anthony Canino University of Pennsylvania, USA, Yu David Liu SUNY Binghamton DOI Pre-print Media Attached

Fri 13 May
Displayed time zone: Eastern Time (US & Canada) change

04:00 - 05:00	Fault LocalizationTechnical Track / NIER - New Ideas and Emerging Results / Journal-First Papers at ICSE room 2 Chair(s): Arosha K Bandara The Open University

5m Talk		The Forgotten Role of Search Queries in IR-based Bug Localization: An Empirical Study Journal-First Papers Masud Rahman Dalhousie University, Foutse Khomh Polytechnique Montréal, Shamima Yeasmin University of Saskatchewan, Chanchal K. Roy University of Saskatchewan Link to publication DOI Pre-print Media Attached
5m Talk		Utilising Persistence for Post Facto Suppression of Invalid Anomalies using System Logs NIER - New Ideas and Emerging Results Seema Nagar IBM Research, Pooja Aggarwal IBM Research, Dipanwita Guhathakurta IIIT Hyderabad, Bing Zhou IBM Research, Rohan Arora IBM Research DOI Pre-print Media Attached
5m Talk		Fault Localization via Efficient Probabilistic Modeling of Program Semantics Technical Track Muhan Zeng Peking University, Yiqian Wu Peking University, Zhentao Ye Peking University, Yingfei Xiong Peking University, Xin Zhang Peking University, Lu Zhang Peking University DOI Pre-print Media Attached

Information for Participants

Tue 10 May 2022 21:00 - 22:00 at ICSE room 1 - Faults and Services Chair(s): Anand Ashok Sawant

Info for room ICSE room 1-odd hours:

Click here to go to the room on Midspace

Fri 13 May 2022 04:00 - 05:00 at ICSE room 2 - Fault Localization Chair(s): Arosha K Bandara

Info for room ICSE room 2-even hours:

Click here to go to the room on Midspace