Active Learning of Discriminative Subgraph Patterns for API Misuse Detection (ICSE 2022 - Journal-First Papers)

Who

Hong Jin Kang, David Lo

Track

ICSE 2022 Journal-First Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 12 May 2022 04:05 - 04:10 at ICSE room 1 - Machine Learning with and for SE 3 Chair(s): Antinisca Di Marco
Thu 12 May 2022 21:00 - 21:05 at ICSE room 2 - Machine Learning with and for SE 8 Chair(s): Seok-Won Lee
Wed 25 May 2022 11:10 - 11:15 at Room 301+302 - Papers 6: Machine Learning with and for SE 1 Chair(s): Baishakhi Ray
Wed 25 May 2022 13:30 - 15:00 at Ballroom Gallery - Posters 1

Abstract

A common cause of bugs and vulnerabilities is the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct patterns of usage. Many approaches confuse a usage pattern’s frequency for correctness. Due to the variety of alternative usage patterns that may be uncommon but correct, anomaly detection-based techniques have limited success in identifying misuses. We address these challenges and propose ALP (Actively Learned Patterns), reformulating API misuse detection as a classification problem. After representing programs as graphs, ALP mines discriminative subgraphs. While still incorporating frequency information, through limited human supervision, we reduce the reliance on the assumption relating frequency and correctness. The principles of active learning are incorporated to shift human attention away from the most frequent patterns. Instead, ALP samples informative and representative examples while minimizing labeling effort. In our empirical evaluation, ALP substantially outperforms prior approaches on both MUBench, an API Misuse benchmark, and a new dataset that we constructed from real-world software projects.

Link to Preprint

https://arxiv.org/abs/2204.09945

File attachments

Poster (Poster_active_learning.pdf)	556KiB

Hong Jin Kang

Singapore Management University

David Lo

Singapore Management University

Singapore

Media

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 12 May
Displayed time zone: Eastern Time (US & Canada) change

04:00 - 05:00	Machine Learning with and for SE 3Technical Track / Journal-First Papers / SEIP - Software Engineering in Practice at ICSE room 1 Chair(s): Antinisca Di Marco University of L'Aquila

5m Talk		In-IDE Code Generation from Natural Language: Promise and Challenges Journal-First Papers Frank Xu Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Graham Neubig Carnegie Mellon University
5m Talk		Active Learning of Discriminative Subgraph Patterns for API Misuse Detection Journal-First Papers Hong Jin Kang Singapore Management University, David Lo Singapore Management University Pre-print Media Attached File Attached
5m Talk		Dependency Tracking for Risk Mitigation in Machine Learning (ML) Systems SEIP - Software Engineering in Practice Xiwei (Sherry) Xu CSIRO Data61, Chen Wang CSIRO DATA61, Zhen Wang CSIRO Data61, Qinghua Lu CSIRO’s Data61, Liming Zhu CSIRO’s Data61; UNSW Media Attached
5m Talk		DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs Technical Track Jialun Cao Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Meiziniu LI Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Xiao Chen Huazhong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Yongqiang Tian The Hong Kong University of Science and Technology; University of Waterloo, Bo Wu MIT-IBM Watson AI Lab in Cambridge, Shing-Chi Cheung Hong Kong University of Science and Technology DOI Pre-print Media Attached
5m Talk		What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code Technical Track Yao Wan Huazhong University of Science and Technology, Wei Zhao Huazhong University of Science and Technology, Hongyu Zhang University of Newcastle, Yulei Sui University of Technology Sydney, Guandong Xu University of Technology, Sydney, Hai Jin Huazhong University of Science and Technology Pre-print Media Attached
5m Talk		A Universal Data Augmentation Approach for Fault Localization Technical Track Huan Xie Chongqing University, Yan Lei School of Big Data & Software Engineering, Chongqing University, Meng Yan Chongqing University, Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China, Xin Xia Huawei Software Engineering Application Technology Lab, Xiaoguang Mao National University of Defense Technology DOI Pre-print Media Attached
5m Talk		DeepState: Selecting Test Suites to Enhance the Robustness of Recurrent Neural Networks Technical Track Zixi Liu Nanjing University, Yang Feng Nanjing University, Yining Yin Nanjing University, China, Zhenyu Chen Nanjing University DOI Pre-print Media Attached

21:00 - 22:00	Machine Learning with and for SE 8Technical Track / Journal-First Papers at ICSE room 2 Chair(s): Seok-Won Lee Ajou University

5m Talk		Active Learning of Discriminative Subgraph Patterns for API Misuse Detection Journal-First Papers Hong Jin Kang Singapore Management University, David Lo Singapore Management University Pre-print Media Attached File Attached
5m Talk		Defect Reduction Planning (using TimeLIME) Journal-First Papers Kewen Peng North Carolina State University, Tim Menzies North Carolina State University Authorizer link Pre-print Media Attached
5m Talk		Learning to Reduce False Positives in Analytic Bug Detectors Technical Track Anant Kharkar Microsoft, Roshanak Zilouchian Moghaddam Microsoft, Matthew Jin Microsoft Corporation, Xiaoyu Liu Microsoft Corporation, Xin Shi Microsoft Corporation, Colin Clement Microsoft, Neel Sundaresan Microsoft Corporation Pre-print Media Attached
5m Talk		Learning to Recommend Method Names with Global Context Technical Track Fang Liu Peking University, Ge Li Peking University, Zhiyi Fu Peking University, Shuai Lu Peking University, Yiyang Hao Silicon Heart Tech Co., Zhi Jin Peking University Pre-print Media Attached
5m Talk		Adaptive Test Selection for Deep Neural Networks Technical Track Xinyu Gao Nanjing University, Yang Feng Nanjing University, Yining Yin Nanjing University, China, Zixi Liu Nanjing University, Zhenyu Chen Nanjing University, Baowen Xu Nanjing University Pre-print Media Attached
5m Talk		Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source Technical Track Anjiang Wei Stanford University, Yinlin Deng University of Illinois at Urbana-Champaign, Chenyuan Yang Nanjing University, Lingming Zhang University of Illinois at Urbana-Champaign Pre-print Media Attached

Wed 25 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Papers 6: Machine Learning with and for SE 1Technical Track / Journal-First Papers / SEIP - Software Engineering in Practice at Room 301+302 Chair(s): Baishakhi Ray Columbia University

11:00 5m Talk		Improving Machine Translation Systems via Isotopic Replacement Technical Track Zeyu Sun Peking University, Jie M. Zhang King's College London, Yingfei Xiong Peking University, Mark Harman University College London, Mike Papadakis University of Luxembourg, Luxembourg, Lu Zhang Peking University Pre-print Media Attached
11:05 5m Talk		Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?Nominated for Distinguished Paper Technical Track Hong Jin Kang Singapore Management University, Khai Loong Aw Singapore Management University, David Lo Singapore Management University DOI Pre-print Media Attached File Attached
11:10 5m Talk		Active Learning of Discriminative Subgraph Patterns for API Misuse Detection Journal-First Papers Hong Jin Kang Singapore Management University, David Lo Singapore Management University Pre-print Media Attached File Attached
11:15 5m Talk		In-IDE Code Generation from Natural Language: Promise and Challenges Journal-First Papers Frank Xu Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Graham Neubig Carnegie Mellon University
11:20 5m Talk		Strategies for Reuse and Sharing among Data Scientists in Software Teams SEIP - Software Engineering in Practice Will Epperson Carnegie Mellon University, April Wang University of Michigan, Robert DeLine Microsoft Research, Steven M. Drucker Microsoft Research Pre-print Media Attached
11:25 5m Talk		Decomposing Convolutional Neural Networks into Reusable and Replaceable Modules Technical Track Rangeet Pan Iowa State University, USA, Hridesh Rajan Iowa State University Pre-print Media Attached
11:30 5m Talk		Fairness-aware Configuration of Machine Learning Libraries Technical Track Saeid Tizpaz-Niari University of Texas at El Paso, Ashish Kumar , Gang (Gary) Tan Pennsylvania State University, Ashutosh Trivedi University of Colorado Boulder DOI Pre-print Media Attached
11:35 5m Talk		Automated Handling of Anaphoric Ambiguity in Requirements: A Multi-solution Study Technical Track Saad Ezzini University of Luxembourg, Sallam Abualhaija University of Luxembourg, Chetan Arora Deakin University, Mehrdad Sabetzadeh University of Ottawa Pre-print Media Attached

13:30 - 15:00	Posters 1Journal-First Papers / SEIP - Software Engineering in Practice / SEET - Software Engineering Education and Training / Technical Track at Ballroom Gallery

13:30 90m Talk		In-IDE Code Generation from Natural Language: Promise and Challenges Journal-First Papers Frank Xu Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Graham Neubig Carnegie Mellon University
13:30 90m Talk		Strategies for Reuse and Sharing among Data Scientists in Software Teams SEIP - Software Engineering in Practice Will Epperson Carnegie Mellon University, April Wang University of Michigan, Robert DeLine Microsoft Research, Steven M. Drucker Microsoft Research Pre-print Media Attached
13:30 90m Talk		Debugging with Stack Overflow: Web Search Behavior in Novice and Expert Programmers SEET - Software Engineering Education and Training Annie Li University of Michigan, Madeline Endres University of Michigan, Westley Weimer University of Michigan DOI Pre-print Media Attached
13:30 90m Talk		Static Stack-Preserving Intra-Procedural Slicing of WebAssembly BinariesBest Artifact Award Technical Track Quentin Stiévenart Vrije Universiteit Brussel, David Binkley Loyola University Maryland, Coen De Roover Vrije Universiteit Brussel DOI Pre-print Media Attached
13:30 90m Talk		Linear-time Temporal Logic guided Greybox Fuzzing Technical Track Ruijie Meng National University of Singapore, Singapore, Zhen Dong Fudan University, China, Jialin Li National University of Singapore, Singapore, Ivan Beschastnikh University of British Columbia, Abhik Roychoudhury National University of Singapore DOI Pre-print Media Attached
13:30 90m Talk		Individual differences limit predicting well-being and productivity using software repositories: a longitudinal industrial study Journal-First Papers Miikka Kuutila University of Oulu, Mika Mäntylä University of Oulu, Maëlick Claes University of Oulu, Marko Elovainio University of Helsinki, Bram Adams Queen's University, Kingston, Ontario Link to publication Media Attached
13:30 90m Talk		The Agile Success Model: A Mixed-methods Study of a Large-scale Agile Transformation Journal-First Papers Daniel Russo Department of Computer Science, Aalborg University Link to publication DOI Pre-print
13:30 90m Talk		PReach: A Heuristic for Probabilistic Reachability to Identify Hard to Reach Statements Technical Track Seemanta Saha University of California Santa Barbara, Mara Downing University of California, Santa Barbara, Tegan Brennan , Tevfik Bultan University of California, Santa Barbara Pre-print Media Attached
13:30 90m Talk		Active Learning of Discriminative Subgraph Patterns for API Misuse Detection Journal-First Papers Hong Jin Kang Singapore Management University, David Lo Singapore Management University Pre-print Media Attached File Attached
13:30 90m Talk		Toward Among-Device AI from On-Device AI with Stream Pipelines SEIP - Software Engineering in Practice MyungJoo Ham Samsung Electronics, Sangjung Woo Samsung Electronics, Jaeyun Jung Samsung Electronics, Wook Song Samsung Electronics, Gichan Jang Samsung Electronics, Yongjoo Ahn Samsung Electronics, Hyoungjoo Ahn Samsung Electronics Pre-print Media Attached
13:30 90m Talk		Integrating Hackathons into an Online Cybersecurity Course SEET - Software Engineering Education and Training Abasi-amefon Obot Affia University of Tartu, Estonia, Alexander Nolte University of Tartu, Raimundas Matulevičius University of Tartu, Estonia DOI Pre-print Media Attached
13:30 90m Talk		Verifying Dynamic Trait Objects in Rust SEIP - Software Engineering in Practice Alexa VanHattum Cornell University, Daniel Schwartz-Narbonne Amazon, n.n., Nathan Chong Amazon, Adrian Sampson Cornell University Pre-print Media Attached
13:30 90m Talk		Automatically Identifying Shared Root Causes of Test Breakages in SAP HANA SEIP - Software Engineering in Practice Gabin An KAIST, Juyeon Yoon Korea Advanced Institute of Science and Technology, Jeongju Sohn University of Luxembourg, Jingun Hong SAP Labs, Dongwon Hwang SAP Labs, Shin Yoo KAIST Pre-print Media Attached
13:30 90m Talk		Guiding Peer-feedback in Learning Software Design using UML SEET - Software Engineering Education and Training Satrio Adi Rukmono Institut Teknologi Bandung, Michel Chaudron Eindhoven University of Technology, The Netherlands Pre-print Media Attached
13:30 90m Talk		Fairness-aware Configuration of Machine Learning Libraries Technical Track Saeid Tizpaz-Niari University of Texas at El Paso, Ashish Kumar , Gang (Gary) Tan Pennsylvania State University, Ashutosh Trivedi University of Colorado Boulder DOI Pre-print Media Attached
13:30 90m Talk		Using Pre-Trained Models to Boost Code Review Automation Technical Track Rosalia Tufano Università della Svizzera Italiana, Simone Masiero Software Institute @ Università della Svizzera Italiana, Antonio Mastropaolo Università della Svizzera italiana, Luca Pascarella Università della Svizzera italiana (USI), Denys Poshyvanyk William and Mary, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print Media Attached
13:30 90m Talk		Automatic Anti-Pattern Detection in Microservice Architectures based on Distributed Tracing SEIP - Software Engineering in Practice Tim Hubener ING Bank N.V., Yaping Luo ING; Eindhoven University of Technology, Pieter Vallen ING, Jonck van der Kogel ING Bank N.V., Tom Liefheid ING Bank N.V., Michel Chaudron Eindhoven University of Technology, The Netherlands Media Attached
13:30 90m Talk		Retrieving Data Constraint Implementations Using Fine-Grained Code Patterns Technical Track Juan Manuel Florez The University of Texas at Dallas, Jonathan Perry The University of Texas at Dallas, Shiyi Wei University of Texas at Dallas, Andrian Marcus University of Texas at Dallas Pre-print Media Attached
13:30 90m Talk		Verification of Consistency between Process Models, Object Life Cycles, and Context-dependent Semantic Specifications Journal-First Papers Ralph Hoch Institute of Computer Technology, TU Wien, Christoph Luckeneder Vienna University of Technology, Roman Popp TU Wien, Vienna, Austria, Hermann Kaindl Institute of Computer Technology, TU Wien Link to publication DOI Pre-print Media Attached
13:30 90m Talk		If a Human Can See It, So Should Your System: Reliability Requirements for Machine Vision Components Technical Track Boyue Caroline Hu University of Toronto, Lina Marsso University of Toronto, Krzysztof Czarnecki University of Waterloo, Canada, Rick Salay University of Toronto, Huakun Shen University of Toronto, Marsha Chechik University of Toronto DOI Pre-print Media Attached
13:30 90m Talk		Preparing Software Engineers to Develop Robot Systems SEET - Software Engineering Education and Training Carl Hildebrandt University of Virginia, Meriel von Stein University of Virginia, Trey Woodlief University of Virginia, Sebastian Elbaum University of Virginia DOI Pre-print Media Attached
13:30 90m Poster		EUGAIN. The European Network For Gender Balance in Informatics Technical Track Valentina Lenarduzzi University of Oulu, Barbora Buhnova Masaryk University, Letizia Jaccheri Norwegian University of Science and Technology
13:30 90m Talk		Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?Nominated for Distinguished Paper Technical Track Hong Jin Kang Singapore Management University, Khai Loong Aw Singapore Management University, David Lo Singapore Management University DOI Pre-print Media Attached File Attached
13:30 90m Talk		An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags Journal-First Papers Christian D. Newman Rochester Institute of Technology, Michael J. Decker Bowling Green State University, Reem S. Alsuhaibani Kent State University, Anthony Peruma Rochester Institute of Technology, Mohamed Wiem Mkaouer Rochester Institute of Technology, Satyajit Mohapatra Rochester Institute of Technology, Tejal Vishnoi Rochester Institute of Technology, Marcos Zampieri Rochester Institute of Technology, Timothy Sheldon BNY Mellon, Emily Hill Drew University Link to publication DOI Pre-print Media Attached
13:30 90m Talk		Counterfactual Explanations for Models of Code SEIP - Software Engineering in Practice Jürgen Cito TU Wien and Meta, Işıl Dillig University of Texas at Austin, Vijayaraghavan Murali Meta Platforms, Inc., Satish Chandra Facebook Pre-print Media Attached
13:30 90m Talk		Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies Technical Track Jibesh Patra University of Stuttgart, Michael Pradel University of Stuttgart Pre-print Media Attached
13:30 90m Talk		Learning to Find Usages of Library Functions in Optimized Binaries Journal-First Papers Toufique Ahmed University of California at Davis, Prem Devanbu Department of Computer Science, University of California, Davis, Anand Ashok Sawant University of California, Davis Link to publication DOI Pre-print Media Attached
13:30 90m Talk		DeepStability: A Study of Unstable Numerical Methods and Their Solutions in Deep Learning Technical Track Eliska Kloberdanz Iowa State University, Kyle Kloberdanz Cape Privacy, Wei Le Iowa State University Pre-print Media Attached
13:30 90m Talk		Fuzzing Class Specifications Technical Track Facundo Molina University of Rio Cuarto and CONICET, Argentina, Marcelo d'Amorim Federal University of Pernambuco, Nazareno Aguirre University of Rio Cuarto and CONICET, Argentina Pre-print Media Attached
13:30 90m Talk		Journal First Submission of the Article: What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk Journal-First Papers Pooja Rani University of bern, Sebastiano Panichella Zurich University of Applied Sciences, Manuel Leuenberger Software Composition Group, University of Bern, Switzerland, Mohammad Ghafari School of Computer Science, University of Auckland, Oscar Nierstrasz University of Bern, Switzerland Link to publication DOI Authorizer link Media Attached

Information for Participants

Thu 12 May 2022 04:00 - 05:00 at ICSE room 1 - Machine Learning with and for SE 3 Chair(s): Antinisca Di Marco

Info for room ICSE room 1-even hours:

Click here to go to the room on Midspace

Thu 12 May 2022 21:00 - 22:00 at ICSE room 2 - Machine Learning with and for SE 8 Chair(s): Seok-Won Lee

Info for room ICSE room 2-odd hours:

Click here to go to the room on Midspace