Improving API Knowledge Discovery with ML: A Case Study of Comparable API Methods (ICSE 2023 - Technical Track)

Who

Daye Nam, Brad A. Myers, Bogdan Vasilescu, Vincent J. Hellendoorn

Track

ICSE 2023 Technical Track

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 May 2023 15:17 - 15:19 at Meeting Room 105 - Posters 1
Fri 19 May 2023 11:30 - 11:45 at Meeting Room 103 - Program comprehension Chair(s): Oscar Chaparro

Abstract

Developers constantly learn new APIs, but often lack necessary information from documentation, resorting instead to popular question-and-answer platforms such as Stack Overflow. In this paper, we investigate how to use recent machine-learning based knowledge extraction techniques to automatically identify pairs of comparable API methods and the sentences describing the comparison from Stack Overflow answers. We first built a prototype that can be stocked with a dataset of comparable API methods and provides tool-tips to users in search results and in API documentation. We conducted a user study with this tool based on a dataset of TensorFlow comparable API methods spanning 198 hand-annotated facts from Stack Overflow posts. This study confirmed that providing comparable API methods is useful in API learning: developers using our tool were significantly more aware of the comparable API methods and better understood the differences between them. We then created SOREL, an comparable API methods knowledge extraction tool trained on our hand-annotated corpus, which achieves a 71% precision and 55% recall at discovering our manually extracted facts and discovers 433 pairs of comparable API methods from thousands of unseen SO posts. This work highlights the merit of jointly studying programming assistance tools and constructing machine learning techniques to power them.

Link to Preprint

https://dayenam.com/assets/pdf/icse23_SOREL.pdf

Daye Nam

Carnegie Mellon University

Brad A. Myers

Carnegie Mellon University

Bogdan Vasilescu

Carnegie Mellon University

United States

Vincent J. Hellendoorn

Carnegie Mellon University

United States

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 May
Displayed time zone: Hobart change

15:15 - 15:45	Posters 1Posters / Technical Track / Showcase at Meeting Room 105

15:15 2m Poster		Distribution-aware Fairness Test Generation Posters Sai Sathiesh Rajan Singapore University of Technology and Design, Singapore, Ezekiel Soremekun Royal Holloway, University of London, Sudipta Chattopadhyay Singapore University of Technology and Design, Yves Le Traon University of Luxembourg, Luxembourg
15:17 2m Talk		Improving API Knowledge Discovery with ML: A Case Study of Comparable API Methods Technical Track Daye Nam Carnegie Mellon University, Brad A. Myers Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University Pre-print
15:19 2m Talk		Diver: Oracle-Guided SMT Solver Testing with Unrestricted Random Mutations Technical Track Jongwook Kim Korea University, Sunbeom So Korea University, Hakjoo Oh Korea University
15:21 2m Talk		Demystifying Exploitable Bugs in Smart Contracts Technical Track Zhuo Zhang Purdue University, Brian Zhang Harrison High School (Tippecanoe), Wen Xu PNM Labs, Zhiqiang Lin The Ohio State University Pre-print
15:23 2m Talk		An Empirical Study of Deep Learning Models for Vulnerability Detection Technical Track Benjamin Steenhoek Iowa State University, Md Mahbubur Rahman Iowa State University, Richard Jiles Iowa State University, Wei Le Iowa State University Pre-print
15:25 2m Talk		MorphQ: Metamorphic Testing of the Qiskit Quantum Computing Platform Technical Track Matteo Paltenghi University of Stuttgart, Germany, Michael Pradel University of Stuttgart Pre-print
15:27 2m Talk		Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction Technical Track Sungmin Kang KAIST, Juyeon Yoon Korea Advanced Institute of Science and Technology, Shin Yoo KAIST Pre-print
15:30 2m Talk		Automating Code-Related Tasks Through Transformers: The Impact of Pre-training Technical Track Rosalia Tufano Università della Svizzera Italiana, Luca Pascarella ETH Zurich, Gabriele Bavota Software Institute, USI Università della Svizzera italiana
15:32 2m Talk		Generic Partition Refinement and Weighted Tree Automata Showcase Hans-Peter Deifel Friedrich-Alexander University Erlangen-Nürnberg, Germany, Stefan Milius , Lutz Schröder University of Erlangen-Nuremberg, Thorsten Wißmann Friedrich-Alexander University Erlangen-Nürnberg Link to publication DOI Pre-print
15:34 2m Talk		Learning Seed-Adaptive Mutation Strategies for Greybox Fuzzing Technical Track Myungho Lee Korea University, Sooyoung Cha Sungkyunkwan University, Hakjoo Oh Korea University
15:36 2m Talk		Bug localization in game software engineering: evolving simulations to locate bugs in software models of video games Showcase Rodrigo Casamayor SVIT Research Group. Universidad San Jorge, Lorena Arcega San Jorge University, Francisca Pérez SVIT Research Group, Universidad San Jorge, Carlos Cetina San Jorge University, Spain DOI
15:38 2m Poster		Don't Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems Posters Zhensu Sun The Hong Kong Polytechnic University, Xiaoning Du Monash University, Australia, Fu Song ShanghaiTech University, Shangwen Wang National University of Defense Technology, Li Li Beihang University
15:40 2m Talk		A Qualitative Study on the Implementation Design Decisions of Developers Technical Track Jenny T. Liang Carnegie Mellon University, Maryam Arab George Mason University, Minhyuk Ko Virginia Tech, Amy Ko University of Washington, Thomas LaToza George Mason University Pre-print
15:42 2m Poster		Closing the Loop for Software Remodularisation - REARRANGE: An Effort Estimation Approach for Software Clustering-based Remodularisation Posters Alvin Jian Jin Tan , Chun Yong Chong Monash University Malaysia, Aldeida Aleti Monash University

Fri 19 May
Displayed time zone: Hobart change

11:00 - 12:30	Program comprehensionTechnical Track / Journal-First Papers at Meeting Room 103 Chair(s): Oscar Chaparro College of William and Mary

11:00 15m Talk		Code Comprehension Confounders: A Study of Intelligence and Personality Journal-First Papers Stefan Wagner University of Stuttgart, Marvin Wyrich Saarland University Link to publication Pre-print
11:15 15m Talk		Identifying Key Classes for Initial Software Comprehension: Can We Do It Better? Technical Track Weifeng Pan Zhejiang Gongshang University, China, Xin Du Zhejiang Gongshang University, China, Hua Ming Oakland University, Dae-Kyoo Kim Oakland University, Zijiang Yang Xi'an Jiaotong University and GuardStrike Inc
11:30 15m Talk		Improving API Knowledge Discovery with ML: A Case Study of Comparable API Methods Technical Track Daye Nam Carnegie Mellon University, Brad A. Myers Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University Pre-print
11:45 15m Talk		Evidence Profiles for Validity Threats in Program Comprehension Experiments Technical Track Marvin Muñoz Barón University of Stuttgart, Marvin Wyrich Saarland University, Daniel Graziotin University of Stuttgart, Stefan Wagner University of Stuttgart Pre-print
12:00 15m Talk		Developers’ Visuo-spatial Mental Model and Program Comprehension Technical Track Abir Bouraffa University of Hamburg, Gian-Luca Fuhrmann , Walid Maalej University of Hamburg Pre-print
12:15 15m Talk		Two Sides of the Same Coin: Exploiting the Impact of Identifiers in Neural Code Comprehension Technical Track Shuzheng Gao Harbin institute of technology, Cuiyun Gao Harbin Institute of Technology, Chaozheng Wang Harbin Institute of Technology, Jun Sun Singapore Management University, David Lo Singapore Management University, Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China