Measuring What Matters: An Aggregate Metric for Assessing Enterprise Code Summaries (FSE 2025 - Ideas, Visions and Reflections)

Mon 23 - Fri 27 June 2025 Trondheim, Norway

Who

Ashita Saxena, Palanivel Kodeswaran, Sayandeep Sen, Srikanth Tamilselvam

Track

FSE 2025 Ideas, Visions and Reflections

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 23 Jun 2025 15:00 - 15:10 at Aurora A - Code Search Chair(s): Xin Xia

Abstract

Evaluating the quality of code summaries is essential for enterprise software, where the complexity and scale of codebases introduce unique challenges that are inadequately addressed by existing public code datasets and evaluation methods. These methods, typically designed for small and straightforward code snippets, often overlook critical issues such as repetitiveness, verbosity, and incompleteness—issues that are particularly prominent in enterprise-level code summaries. While correctness has been extensively studied, other dimensions critical to enterprise contexts, such as distinctiveness and completeness, remain underexplored. To address these gaps, we propose a novel evaluation framework that emphasizes aggregated metrics tailored to enterprise needs, prioritizing both distinctiveness and completeness. This framework introduces metrics designed to penalize verbosity and redundancy while rewarding informativeness and alignment with the underlying code. Initial experiments conducted on human-annotated enterprise Java datasets demonstrate the effectiveness of our approach by improving the RMSE values by 7.4% over the baselines. Correlation studies of our distinctiveness and completeness metrics with human ratings also shows improvement of 32% and 5.3% respectively over the baselines.

Ashita Saxena

IBM Research

Palanivel Kodeswaran

IBM Research India

Sayandeep Sen

IBM Research India

Srikanth Tamilselvam

IBM Research

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 23 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:30	Code SearchResearch Papers / Journal First / Ideas, Visions and Reflections at Aurora A Chair(s): Xin Xia Zhejiang University

14:00 20m Talk		10 years later: revisiting how developers search for code Research Papers Kathryn Stolee North Carolina State University, Tobias Welp Google, Caitlin Sadowski , Sebastian Elbaum University of Virginia DOI
14:20 20m Talk		Approaching Code Search for Python as a Translation Retrieval Problem with Dual Encoders Journal First Monoshiz Mahbub Khan Rochester Institute of Technology, Zhe Yu Rochester Institute of Technology
14:40 20m Talk		Zero-Shot Cross-Domain Code Search without Fine-Tuning Research Papers Keyu Liang Zhejiang University, Zhongxin Liu Zhejiang University, Chao Liu Chongqing University, Zhiyuan Wan Zhejiang University, David Lo Singapore Management University, Xiaohu Yang Zhejiang University DOI
15:00 10m Talk		Measuring What Matters: An Aggregate Metric for Assessing Enterprise Code Summaries Ideas, Visions and Reflections Ashita Saxena IBM Research, Palanivel Kodeswaran IBM Research India, Sayandeep Sen IBM Research India, Srikanth Tamilselvam IBM Research
15:10 20m Talk		MiSum: Multi-Modality Heterogeneous Code Graph Learning for Multi-Intent Binary Code Summarization Research Papers Kangchen Zhu National university of Defense Technology, Zhiliang Tian National University of Defense Technology, Shangwen Wang National University of Defense Technology, Weiguo Chen National University of Defense Technology, Zixuan Dong National University of Defense Technology, mingyue leng National University of Defense Technology, Xiaoguang Mao National University of Defense Technology DOI

Information for Participants

Mon 23 Jun 2025 14:00 - 15:30 at Aurora A - Code Search Chair(s): Xin Xia

Info for room Aurora A:

Aurora A is the first room in the Aurora wing.

When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.