ICSE 2024
Fri 12 - Sun 21 April 2024 Lisbon, Portugal
Fri 19 Apr 2024 11:15 - 11:30 at Glicínia Quartin - Evolution 4 Chair(s): Wesley Assunção

Code idioms are commonly used patterns, techniques, or practices that aid in solving particular problems or specific tasks across multiple software projects. They can improve code quality, performance, and maintainability, and also promote program standardization and reuse across projects. However, identifying code idioms is significantly challenging, as existing studies have still suffered from three main limitations. First, it is difficult to recognize idioms that span noncontiguous code lines. Second, identifying idioms with intricate data flow and code structures can be challenging. Moreover, they only extract dataset-specific idioms, so common idioms or well-established code/design patterns that are rarely found in datasets cannot be identified.

To overcome these limitations, we propose a novel approach, named IdioMine, to automatically extract generic and specific idioms from both Java projects and libraries. We perform program analysis on Java functions to transform them into concise PDGs, for integrating the data flow and control flow of code fragments. We then develop a novel chain structure, Data-driven Control Chain (DCC), to extract sub-idioms that possess contiguous semantic meanings from PDGs. After that, we utilize GraphCodeBERT to generate code embeddings of these sub-idioms and perform density-based clustering to obtain frequent sub-idioms. We use heuristic rules to identify interrelated sub-idioms among the frequent ones. Finally, we employ ChatGPT to synthesize interrelated sub-idioms into potential code idioms and infer real idioms from them.

We conduct well-designed experiments and a user study to evaluate IdioMine’s correctness and the practical value of the extracted idioms. Our experimental results show that IdioMine effectively extracts more idioms with better performance in most metrics. We compare our approach with Haggis and ChatGPT, IdioMine outperforms them by 30.0% and 42.7% in Idiom Set Precision (ISP) and by 13.1% and 26.3% in Idiom Coverage (IC) when extracting idioms from libraries. IdioMine also extracts almost twice the size of idioms than the baselines, exhibiting its ability to identify complete idioms. Our user study indicates that idioms extracted by IdioMine are well-formed and semantically clear. Moreover, we conduct a qualitative and quantitative analysis to investigate the primary functionalities of IdioMine’s extracted idioms from various projects and libraries.

Fri 19 Apr

Displayed time zone: Lisbon change

11:00 - 12:30
11:00
15m
Talk
MUT: Human-in-the-Loop Unit Test Migration
Research Track
Yi Gao Zhejiang University, Xing Hu Zhejiang University, Tongtong Xu Huawei, Xin Xia Huawei Technologies, David Lo Singapore Management University, Xiaohu Yang Zhejiang University
11:15
15m
Talk
Streamlining Java Programming: Uncovering Well-Formed Idioms with IdioMine
Research Track
Yanming Yang Zhejiang University, Xing Hu Zhejiang University, Xin Xia Huawei Technologies, David Lo Singapore Management University, Xiaohu Yang Zhejiang University
11:30
15m
Talk
Fine-grained, accurate and scalable source differencing
Research Track
Jean-Rémy Falleri Bordeaux INP, Matias Martinez Universitat Politècnica de Catalunya (UPC)
11:45
15m
Talk
A Catalog of Unintended Software Dependencies in Multi-Lingual Systems at ASML
Software Engineering in Practice
Tom Groot Eindhoven University of Technology & ASML, Lina Ochoa Eindhoven University of Technology, Bogdan Lazar ASML, Jacob Krüger Eindhoven University of Technology
12:00
7m
Talk
Runtime Evolution of Bitcoin’s Consensus Rules
Journal-first Papers
Jakob Svennevik Notland Norwegian University of Science and Technology, Mariusz Nowostawski Norwegian University of Science and Technology, Jingyue Li Norwegian University of Science and Technology (NTNU)
12:07
7m
Talk
CfgNet: A Framework for Tracking Equality-Based Configuration Dependencies Across a Software Project
Journal-first Papers
Sebastian Simon Leipzig University, Nicolai Ruckel Secunet Security Networks AG, Norbert Siegmund Leipzig University
12:14
7m
Talk
Hyperparameter Optimization for AST Differencing
Journal-first Papers
Matias Martinez Universitat Politècnica de Catalunya (UPC), Jean-Rémy Falleri Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, UMR5800, F-33400 Talence, France, Martin Monperrus KTH Royal Institute of Technology, Matias Martinez Universitat Politècnica de Catalunya (UPC), Matias Martinez Universitat Politècnica de Catalunya (UPC)