ICPC 2026
Sun 12 - Mon 13 April 2026 Rio de Janeiro, Brazil
co-located with ICSE 2026

SQL query comprehension is a significant challenge in database and data analysis environments due to complex syntax, diverse join types, and deep nesting. Despite its critical role in backend development and data science, many queries, particularly within legacy systems, often lack adequate comments, which severely hinders code readability, maintainability, and knowledge transfer. Existing approaches to automated SQL comment generation face two main challenges: limited training datasets that inadequately represent real-world analytical queries involving multi-table joins, window functions, and complex aggregations, and an insufficient understanding of SQL-specific logical semantics and schema-related context by Large Language Models (LLMs), even after standard training. Our empirical analysis shows that even after continual pre-training and supervised fine-tuning, LLMs struggle to precisely understand complex SQL semantics, leading to inaccurate or incomplete comments. To address these challenges, we propose SQL-Commenter, an advanced comment generation method based on LLaMA-3.1-8B. First, we construct a comprehensive dataset containing longer, more complex SQL queries with expert-verified, detailed comments. Second, we perform continual pre-training using a large-scale SQL corpus to enhance the LLM’s understanding of SQL syntax and semantics. Then, we conduct supervised fine-tuning with our high-quality dataset. Finally, we introduce Direct Preference Optimization (DPO), which leverages human feedback to significantly improve comment quality. SQL-Commenter utilizes a preference-based loss function that encourages the LLM to increase the probability of preferred outputs while decreasing the probability of non-preferred outputs, thereby enhancing both fine-grained semantic learning, such as distinguishing between different join types, and context-dependent quality assessment based on business logic. We evaluate SQL-Commenter on the authoritative Spider and Bird benchmarks, where it significantly outperforms state-of-the-art baselines. On average, across these datasets, our method surpasses the strongest baseline (Qwen3-14B) by 9.29, 4.99, and 13.23 percentage points on BLEU-4, METEOR, and ROUGE-L, respectively. Moreover, human evaluation demonstrates the superior quality of comments generated by SQL-Commenter in terms of correctness, completeness, and naturalness.

SQL-Commenter: Aligning Large Language Models for SQL Comment Generation with Direct Preference Optimization (ICPC_SQL-Commenter_LeiYu.pdf)1.22MiB

Mon 13 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30
Session 5 - Summarization, Documentation, and Code ReviewResearch Track / Vaclav Rajlich Early Career Award / ICPC Program / Journal First at Europa II
Chair(s): Masud Rahman Dalhousie University
11:00
10m
Talk
Vaclav Rajlich Award
Vaclav Rajlich Early Career Award
Marvin Wyrich Saarland University
11:10
10m
Talk
RepoMind: Enhancing Repository-Level Code Generation via LLM Reasoning over Structured Repository Documentation
Research Track
Songwen Gong South China University of Technology, Mengzhen Wang South China University of Technology, Jiexin Wang South China University of Technology, Yi Cai School of Software Engineering, South China University of Technology, Guangzhou, China
11:20
10m
Talk
SQL-Commenter: Aligning Large Language Models for SQL Comment Generation with Direct Preference Optimization
Research Track
Lei Yu Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China, Peng Wang Institute of Information Engineering,Chinese Academy of Sciences, Jingyuan Zhang Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China, Xin Wang Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Jia Xu Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Li Yang Institute of Software, Chinese Academy of Sciences, Changzhi Deng Institute of Software, Chinese Academy of Sciences, Jiajia Ma Institute of Software, Chinese Academy of Sciences, China, Fengjun Zhang Institute of Software, Chinese Academy of Sciences, China
Pre-print Media Attached File Attached
11:30
10m
Talk
Studying Quality Improvements Recommended via Manual and Automated Code Review
Research Track
Giuseppe Crupi Università della Svizzera italiana, Rosalia Tufano Università della Svizzera Italiana, Gabriele Bavota Software Institute @ Università della Svizzera Italiana
Pre-print
11:40
10m
Talk
Towards Universal Segmentation for Log Parsing
Research Track
Van-Hoang Le University of Luxembourg, Luxembourg, Domenico Bianculli University of Luxembourg, Huy-Trung Nguyen Posts and Telecommunications Institute of Technology
Pre-print
11:50
10m
Talk
DPS: Design Pattern Summarisation Using Code Features
Journal First
Najam Nazar Monash University, Sameer Sikka University of Melbourne, Christoph Treude Singapore Management University
12:00
10m
Talk
On the Impact of Code Comments for Automated Bug-Fixing: An Empirical Study
Research Track
Antonio Vitale Politecnico di Torino, University of Molise, Emanuela Guglielmi University of Molise, Simone Scalabrino University of Molise, Rocco Oliveto University of Molise
Pre-print
12:10
20m
Live Q&A
Joint QA and Discussion
ICPC Program