ICPC 2024
Sun 14 - Sat 20 April 2024 Lisbon, Portugal
co-located with ICSE 2024

When comprehending code, a helping hand may come from the natural language comments documenting it that, unfortunately, are not always there. To support developers in such a scenario, several techniques have been presented to automatically generate natural language summaries for a given code. Most recent approaches exploit deep learning (DL) to automatically document classes or functions, while very little effort has been devoted to more fine-grained documentation (e.g., documenting code snippets or even a single statement). Such a design choice is dictated by the availability of training data: For example, in the case of Java, it is easy to create datasets composed of pairs $<$$method$, $javadoc$$>$ that can be fed to DL models to teach them how to summarize a method. Such a comment-to-code linking is instead non-trivial when it comes to inner comments (i.e., comments within a function) documenting a few statements. In this work, we take all steps needed to train a DL model to automatically document code snippets. First, we manually built a dataset featuring 6.6k comments that have been (i) classified based on their type (e.g., code summary, TODO), and (ii) linked to the code statements they document. Second, we used such a dataset to train a multi-task DL model taking as input a comment and being able to (i) classify whether it represents a ``code summary'' or not, and (ii) link it to the code statements it documents. Our trained model identifies code summaries with 83% accuracy and is able to link them to the documented lines of code with recall and precision higher than 80%. Third, we run this model on 10k open-source projects, automatically identifying code summaries and linking them to the related documented code. This allowed building a large-scale dataset of documented code snippets that has then been used to train a new DL model able to automatically document code snippets. A comparison with state-of-the-art baselines shows the superiority of the proposed approach, which, however, is still far from representing an accurate solution for snippet summarization.

Mon 15 Apr

Displayed time zone: Lisbon change

11:00 - 12:30
11:00
10m
Talk
Towards Summarizing Code Snippets Using Pre-Trained TransformersICPCICPC Full paper
Research Track
Antonio Mastropaolo Università della Svizzera italiana, Matteo Ciniselli Università della Svizzera Italiana, Luca Pascarella ETH Zurich, Rosalia Tufano Università della Svizzera Italiana, Emad Aghajani Software Institute, USI Università della Svizzera italiana, Gabriele Bavota Software Institute @ Università della Svizzera Italiana
Pre-print
11:10
10m
Talk
Generating Java Methods: An Empirical Assessment of Four AI-Based Code AssistantsICPCICPC Full paper
Research Track
Vincenzo Corso University of Milano - Bicocca, Leonardo Mariani University of Milano-Bicocca, Daniela Micucci University of Milano-Bicocca, Italy, Oliviero Riganelli University of Milano - Bicocca
Pre-print
11:20
10m
Talk
Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with CopilotICPCICPC Full paper
Research Track
Ionut Daniel Fagadau University of Milano - Bicocca, Leonardo Mariani University of Milano-Bicocca, Daniela Micucci University of Milano-Bicocca, Italy, Oliviero Riganelli University of Milano - Bicocca
Pre-print
11:30
10m
Talk
Interpretable Online Log Analysis Using Large Language Models with Prompt StrategiesICPCICPC Full paper
Research Track
Yilun Liu Huawei co. LTD, Shimin Tao University of Science and Technology of China; Huawei co. LTD, Weibin Meng Huawei co. LTD, Jingyu Wang , Wenbing Ma Huawei co. LTD, Yuhang Chen University of Science and Technology of China, Yanqing Zhao Huawei co. LTD, Hao Yang Huawei co. LTD, Yanfei Jiang Huawei co. LTD
Pre-print
11:40
10m
Talk
Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code SummarizationICPCICPC RENE Paper
Replications and Negative Results (RENE)
Jiliang Li Vanderbilt University, Yifan Zhang Vanderbilt University, Zachary Karas Vanderbilt University, Collin McMillan University of Notre Dame, Kevin Leach Vanderbilt University, Yu Huang Vanderbilt University
Pre-print
11:50
10m
Talk
Knowledge-Aware Code Generation with Large Language ModelsICPCICPC Full paper
Research Track
Tao Huang Shandong Normal University, Zhihong Sun Shandong Normal University, Zhi Jin Peking University, Ge Li Peking University, Chen Lyu Shandong Normal University
Pre-print
12:00
8m
Talk
Enhancing Source Code Representations for Deep Learning with Static AnalysisICPCICPC ERA Paper
Early Research Achievements (ERA)
Xueting Guan University of Melbourne, Christoph Treude Singapore Management University
Pre-print
12:08
8m
Talk
AthenaLLM: Supporting Experiments with Large Language Models in Software DevelopmentICPCICPC Tools
Tool Demonstration
Benedito Fernando Albuquerque de Oliveira Federal University of Pernambuco, Fernando Castor University of Twente and Federal University of Pernambuco
12:16
14m
Talk
AI-Assisted Program Comprehension: Panel with SpeakersICPC
Discussion