Detecting False Alarms from Automatic Static Analysis Tools: How Far are We? (ICSE 2022 - Technical Track)

Who

Hong Jin Kang, Khai Loong Aw, David Lo

Track

ICSE 2022 Technical Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 9 May 2022 22:25 - 22:30 at ICSE room 1 - Machine Learning with and for SE 5 Chair(s): Jürgen Cito
Wed 11 May 2022 05:15 - 05:20 at ICSE room 1 - Machine Learning with and for SE 2 Chair(s): Gemma Catolino
Wed 25 May 2022 11:05 - 11:10 at Room 301+302 - Papers 6: Machine Learning with and for SE 1 Chair(s): Baishakhi Ray
Wed 25 May 2022 13:30 - 15:00 at Ballroom Gallery - Posters 1

Abstract

Automatic static analysis tools (ASATs), such as Findbugs, have a high false alarm rate. The large number of false alarms produced poses a barrier to adoption. Researchers have proposed the use of machine learning to prune false alarms and present only actionable warnings to developers. The state-of-the-art study has identified a set of “Golden Features” based on metrics computed over the characteristics and history of the file, code, and warning. Recent studies show that machine learning using these features is extremely effective and that they achieve almost perfect performance.

We perform a detailed analysis to better understand the strong performance of the “Golden Features”. We found that several studies used an experimental procedure that results in data leakage and data duplication, which are subtle issues with significant implications. Firstly, the ground-truth labels have leaked into features that measure the proportion of actionable warnings in a given context. Secondly, many warnings in the testing dataset appear in the training dataset. Next, we demonstrate limitations in the warning oracle that determines the ground-truth labels, a heuristic comparing warnings in a given revision to a reference revision in the future. We show the choice of reference revision influences the warning distribution. Moreover, the heuristic produces labels that do not agree with human oracles. Hence, the strong performance of these techniques previously seen is overoptimistic of their true performance if adopted in practice. Our results convey several lessons and provide guidelines for evaluating false alarm detectors.

Link to Preprint

https://arxiv.org/abs/2202.05982

DOI

https://doi.org/10.1145/3510003.3510214

File attachments

Poster (Poster_icse_how_far.pdf)	547KiB

Hong Jin Kang

Singapore Management University

Khai Loong Aw

Singapore Management University

David Lo

Singapore Management University

Singapore

Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 9 May
Displayed time zone: Eastern Time (US & Canada) change

22:00 - 23:00	Machine Learning with and for SE 5Technical Track / Journal-First Papers / SEIP - Software Engineering in Practice at ICSE room 1 Chair(s): Jürgen Cito TU Wien and Meta

5m Talk		Automatic Fault Detection for Deep Learning Programs Using Graph Transformations Journal-First Papers Amin Nikanjam École Polytechnique de Montréal, Houssem Ben Braiek École Polytechnique de Montréal, Mohammad Mehdi Morovati École Polytechnique de Montréal, Foutse Khomh Polytechnique Montréal Link to publication DOI Media Attached
5m Talk		Counterfactual Explanations for Models of Code SEIP - Software Engineering in Practice Jürgen Cito TU Wien and Meta, Işıl Dillig University of Texas at Austin, Vijayaraghavan Murali Meta Platforms, Inc., Satish Chandra Facebook Pre-print Media Attached
5m Talk		VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning Technical Track Qibin Chen Carnegie Mellon University, Jeremy Lacomis Carnegie Mellon University, Edward J. Schwartz Carnegie Mellon University Software Engineering Institute, Graham Neubig Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Claire Le Goues Carnegie Mellon University DOI Pre-print Media Attached
5m Talk		Towards Training Reproducible Deep Learning Models Technical Track Boyuan Chen Centre for Software Excellence, Huawei Canada, Mingzhi Wen Huawei Technologies, Yong Shi Huawei Technologies, Dayi Lin Centre for Software Excellence, Huawei, Canada, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Zhen Ming (Jack) Jiang York University Pre-print Media Attached
5m Talk		Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and ProcessDistinguished Paper Award Technical Track Nadia Nahar Carnegie Mellon University, Shurui Zhou University of Toronto, Grace Lewis Carnegie Mellon Software Engineering Institute, Christian Kästner Carnegie Mellon University Pre-print Media Attached
5m Talk		Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?Nominated for Distinguished Paper Technical Track Hong Jin Kang Singapore Management University, Khai Loong Aw Singapore Management University, David Lo Singapore Management University DOI Pre-print Media Attached File Attached

Wed 11 May
Displayed time zone: Eastern Time (US & Canada) change

05:00 - 06:00	Machine Learning with and for SE 2Technical Track / Journal-First Papers / SEIP - Software Engineering in Practice at ICSE room 1 Chair(s): Gemma Catolino Tilburg University & Jheronimus Academy of Data Science

5m Talk		Lessons Learnt on Reproducibility in Machine Learning Based Android Malware Detection Journal-First Papers Nadia Daoudi SnT, University of Luxembourg, Kevin Allix University of Luxembourg, Tegawendé F. Bissyandé SnT, University of Luxembourg, Jacques Klein University of Luxembourg Link to publication Pre-print Media Attached
5m Talk		Mining Root Cause Knowledge from Cloud Service Incident Investigations for AIOps SEIP - Software Engineering in Practice Amrita Saha Salesforce Research Asia, Steven C.H. Hoi Salesforce Research Asia Pre-print Media Attached
5m Talk		Improving Machine Translation Systems via Isotopic Replacement Technical Track Zeyu Sun Peking University, Jie M. Zhang King's College London, Yingfei Xiong Peking University, Mark Harman University College London, Mike Papadakis University of Luxembourg, Luxembourg, Lu Zhang Peking University Pre-print Media Attached
5m Talk		Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?Nominated for Distinguished Paper Technical Track Hong Jin Kang Singapore Management University, Khai Loong Aw Singapore Management University, David Lo Singapore Management University DOI Pre-print Media Attached File Attached
5m Talk		DeepAnalyze: Learning to Localize Crashes at Scale Technical Track Manish Shetty Microsoft Research, India, Chetan Bansal Microsoft Research, Suman Nath Microsoft Corporation, Sean Bowles Microsoft, Henry Wang Microsoft, Ozgur Arman Microsoft, Siamak Ahari Microsoft Pre-print Media Attached

Wed 25 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Papers 6: Machine Learning with and for SE 1Technical Track / Journal-First Papers / SEIP - Software Engineering in Practice at Room 301+302 Chair(s): Baishakhi Ray Columbia University

11:00 5m Talk		Improving Machine Translation Systems via Isotopic Replacement Technical Track Zeyu Sun Peking University, Jie M. Zhang King's College London, Yingfei Xiong Peking University, Mark Harman University College London, Mike Papadakis University of Luxembourg, Luxembourg, Lu Zhang Peking University Pre-print Media Attached
11:05 5m Talk		Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?Nominated for Distinguished Paper Technical Track Hong Jin Kang Singapore Management University, Khai Loong Aw Singapore Management University, David Lo Singapore Management University DOI Pre-print Media Attached File Attached
11:10 5m Talk		Active Learning of Discriminative Subgraph Patterns for API Misuse Detection Journal-First Papers Hong Jin Kang Singapore Management University, David Lo Singapore Management University Pre-print Media Attached File Attached
11:15 5m Talk		In-IDE Code Generation from Natural Language: Promise and Challenges Journal-First Papers Frank Xu Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Graham Neubig Carnegie Mellon University
11:20 5m Talk		Strategies for Reuse and Sharing among Data Scientists in Software Teams SEIP - Software Engineering in Practice Will Epperson Carnegie Mellon University, April Wang University of Michigan, Robert DeLine Microsoft Research, Steven M. Drucker Microsoft Research Pre-print Media Attached
11:25 5m Talk		Decomposing Convolutional Neural Networks into Reusable and Replaceable Modules Technical Track Rangeet Pan Iowa State University, USA, Hridesh Rajan Iowa State University Pre-print Media Attached
11:30 5m Talk		Fairness-aware Configuration of Machine Learning Libraries Technical Track Saeid Tizpaz-Niari University of Texas at El Paso, Ashish Kumar , Gang (Gary) Tan Pennsylvania State University, Ashutosh Trivedi University of Colorado Boulder DOI Pre-print Media Attached
11:35 5m Talk		Automated Handling of Anaphoric Ambiguity in Requirements: A Multi-solution Study Technical Track Saad Ezzini University of Luxembourg, Sallam Abualhaija University of Luxembourg, Chetan Arora Deakin University, Mehrdad Sabetzadeh University of Ottawa Pre-print Media Attached

13:30 - 15:00	Posters 1Journal-First Papers / SEIP - Software Engineering in Practice / SEET - Software Engineering Education and Training / Technical Track at Ballroom Gallery

13:30 90m Talk		In-IDE Code Generation from Natural Language: Promise and Challenges Journal-First Papers Frank Xu Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Graham Neubig Carnegie Mellon University
13:30 90m Talk		Strategies for Reuse and Sharing among Data Scientists in Software Teams SEIP - Software Engineering in Practice Will Epperson Carnegie Mellon University, April Wang University of Michigan, Robert DeLine Microsoft Research, Steven M. Drucker Microsoft Research Pre-print Media Attached
13:30 90m Talk		Debugging with Stack Overflow: Web Search Behavior in Novice and Expert Programmers SEET - Software Engineering Education and Training Annie Li University of Michigan, Madeline Endres University of Michigan, Westley Weimer University of Michigan DOI Pre-print Media Attached
13:30 90m Talk		Static Stack-Preserving Intra-Procedural Slicing of WebAssembly BinariesBest Artifact Award Technical Track Quentin Stiévenart Vrije Universiteit Brussel, David Binkley Loyola University Maryland, Coen De Roover Vrije Universiteit Brussel DOI Pre-print Media Attached
13:30 90m Talk		Linear-time Temporal Logic guided Greybox Fuzzing Technical Track Ruijie Meng National University of Singapore, Singapore, Zhen Dong Fudan University, China, Jialin Li National University of Singapore, Singapore, Ivan Beschastnikh University of British Columbia, Abhik Roychoudhury National University of Singapore DOI Pre-print Media Attached
13:30 90m Talk		Individual differences limit predicting well-being and productivity using software repositories: a longitudinal industrial study Journal-First Papers Miikka Kuutila University of Oulu, Mika Mäntylä University of Oulu, Maëlick Claes University of Oulu, Marko Elovainio University of Helsinki, Bram Adams Queen's University, Kingston, Ontario Link to publication Media Attached
13:30 90m Talk		The Agile Success Model: A Mixed-methods Study of a Large-scale Agile Transformation Journal-First Papers Daniel Russo Department of Computer Science, Aalborg University Link to publication DOI Pre-print
13:30 90m Talk		PReach: A Heuristic for Probabilistic Reachability to Identify Hard to Reach Statements Technical Track Seemanta Saha University of California Santa Barbara, Mara Downing University of California, Santa Barbara, Tegan Brennan , Tevfik Bultan University of California, Santa Barbara Pre-print Media Attached
13:30 90m Talk		Active Learning of Discriminative Subgraph Patterns for API Misuse Detection Journal-First Papers Hong Jin Kang Singapore Management University, David Lo Singapore Management University Pre-print Media Attached File Attached
13:30 90m Talk		Toward Among-Device AI from On-Device AI with Stream Pipelines SEIP - Software Engineering in Practice MyungJoo Ham Samsung Electronics, Sangjung Woo Samsung Electronics, Jaeyun Jung Samsung Electronics, Wook Song Samsung Electronics, Gichan Jang Samsung Electronics, Yongjoo Ahn Samsung Electronics, Hyoungjoo Ahn Samsung Electronics Pre-print Media Attached
13:30 90m Talk		Integrating Hackathons into an Online Cybersecurity Course SEET - Software Engineering Education and Training Abasi-amefon Obot Affia University of Tartu, Estonia, Alexander Nolte University of Tartu, Raimundas Matulevičius University of Tartu, Estonia DOI Pre-print Media Attached
13:30 90m Talk		Verifying Dynamic Trait Objects in Rust SEIP - Software Engineering in Practice Alexa VanHattum Cornell University, Daniel Schwartz-Narbonne Amazon, n.n., Nathan Chong Amazon, Adrian Sampson Cornell University Pre-print Media Attached
13:30 90m Talk		Automatically Identifying Shared Root Causes of Test Breakages in SAP HANA SEIP - Software Engineering in Practice Gabin An KAIST, Juyeon Yoon Korea Advanced Institute of Science and Technology, Jeongju Sohn University of Luxembourg, Jingun Hong SAP Labs, Dongwon Hwang SAP Labs, Shin Yoo KAIST Pre-print Media Attached
13:30 90m Talk		Guiding Peer-feedback in Learning Software Design using UML SEET - Software Engineering Education and Training Satrio Adi Rukmono Institut Teknologi Bandung, Michel Chaudron Eindhoven University of Technology, The Netherlands Pre-print Media Attached
13:30 90m Talk		Fairness-aware Configuration of Machine Learning Libraries Technical Track Saeid Tizpaz-Niari University of Texas at El Paso, Ashish Kumar , Gang (Gary) Tan Pennsylvania State University, Ashutosh Trivedi University of Colorado Boulder DOI Pre-print Media Attached
13:30 90m Talk		Using Pre-Trained Models to Boost Code Review Automation Technical Track Rosalia Tufano Università della Svizzera Italiana, Simone Masiero Software Institute @ Università della Svizzera Italiana, Antonio Mastropaolo Università della Svizzera italiana, Luca Pascarella Università della Svizzera italiana (USI), Denys Poshyvanyk William and Mary, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print Media Attached
13:30 90m Talk		Automatic Anti-Pattern Detection in Microservice Architectures based on Distributed Tracing SEIP - Software Engineering in Practice Tim Hubener ING Bank N.V., Yaping Luo ING; Eindhoven University of Technology, Pieter Vallen ING, Jonck van der Kogel ING Bank N.V., Tom Liefheid ING Bank N.V., Michel Chaudron Eindhoven University of Technology, The Netherlands Media Attached
13:30 90m Talk		Retrieving Data Constraint Implementations Using Fine-Grained Code Patterns Technical Track Juan Manuel Florez The University of Texas at Dallas, Jonathan Perry The University of Texas at Dallas, Shiyi Wei University of Texas at Dallas, Andrian Marcus University of Texas at Dallas Pre-print Media Attached
13:30 90m Talk		Verification of Consistency between Process Models, Object Life Cycles, and Context-dependent Semantic Specifications Journal-First Papers Ralph Hoch Institute of Computer Technology, TU Wien, Christoph Luckeneder Vienna University of Technology, Roman Popp TU Wien, Vienna, Austria, Hermann Kaindl Institute of Computer Technology, TU Wien Link to publication DOI Pre-print Media Attached
13:30 90m Talk		If a Human Can See It, So Should Your System: Reliability Requirements for Machine Vision Components Technical Track Boyue Caroline Hu University of Toronto, Lina Marsso University of Toronto, Krzysztof Czarnecki University of Waterloo, Canada, Rick Salay University of Toronto, Huakun Shen University of Toronto, Marsha Chechik University of Toronto DOI Pre-print Media Attached
13:30 90m Talk		Preparing Software Engineers to Develop Robot Systems SEET - Software Engineering Education and Training Carl Hildebrandt University of Virginia, Meriel von Stein University of Virginia, Trey Woodlief University of Virginia, Sebastian Elbaum University of Virginia DOI Pre-print Media Attached
13:30 90m Poster		EUGAIN. The European Network For Gender Balance in Informatics Technical Track Valentina Lenarduzzi University of Oulu, Barbora Buhnova Masaryk University, Letizia Jaccheri Norwegian University of Science and Technology
13:30 90m Talk		Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?Nominated for Distinguished Paper Technical Track Hong Jin Kang Singapore Management University, Khai Loong Aw Singapore Management University, David Lo Singapore Management University DOI Pre-print Media Attached File Attached
13:30 90m Talk		An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags Journal-First Papers Christian D. Newman Rochester Institute of Technology, Michael J. Decker Bowling Green State University, Reem S. Alsuhaibani Kent State University, Anthony Peruma Rochester Institute of Technology, Mohamed Wiem Mkaouer Rochester Institute of Technology, Satyajit Mohapatra Rochester Institute of Technology, Tejal Vishnoi Rochester Institute of Technology, Marcos Zampieri Rochester Institute of Technology, Timothy Sheldon BNY Mellon, Emily Hill Drew University Link to publication DOI Pre-print Media Attached
13:30 90m Talk		Counterfactual Explanations for Models of Code SEIP - Software Engineering in Practice Jürgen Cito TU Wien and Meta, Işıl Dillig University of Texas at Austin, Vijayaraghavan Murali Meta Platforms, Inc., Satish Chandra Facebook Pre-print Media Attached
13:30 90m Talk		Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies Technical Track Jibesh Patra University of Stuttgart, Michael Pradel University of Stuttgart Pre-print Media Attached
13:30 90m Talk		Learning to Find Usages of Library Functions in Optimized Binaries Journal-First Papers Toufique Ahmed University of California at Davis, Prem Devanbu Department of Computer Science, University of California, Davis, Anand Ashok Sawant University of California, Davis Link to publication DOI Pre-print Media Attached
13:30 90m Talk		DeepStability: A Study of Unstable Numerical Methods and Their Solutions in Deep Learning Technical Track Eliska Kloberdanz Iowa State University, Kyle Kloberdanz Cape Privacy, Wei Le Iowa State University Pre-print Media Attached
13:30 90m Talk		Fuzzing Class Specifications Technical Track Facundo Molina University of Rio Cuarto and CONICET, Argentina, Marcelo d'Amorim Federal University of Pernambuco, Nazareno Aguirre University of Rio Cuarto and CONICET, Argentina Pre-print Media Attached
13:30 90m Talk		Journal First Submission of the Article: What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk Journal-First Papers Pooja Rani University of bern, Sebastiano Panichella Zurich University of Applied Sciences, Manuel Leuenberger Software Composition Group, University of Bern, Switzerland, Mohammad Ghafari School of Computer Science, University of Auckland, Oscar Nierstrasz University of Bern, Switzerland Link to publication DOI Authorizer link Media Attached

Information for Participants

Mon 9 May 2022 22:00 - 23:00 at ICSE room 1 - Machine Learning with and for SE 5 Chair(s): Jürgen Cito

Info for room ICSE room 1-even hours:

Click here to go to the room on Midspace

Wed 11 May 2022 05:00 - 06:00 at ICSE room 1 - Machine Learning with and for SE 2 Chair(s): Gemma Catolino

Info for room ICSE room 1-odd hours:

Click here to go to the room on Midspace