Write a Blog >>

Prior studies have demonstrated that approaches to generate an answer summary for a given technical query in Software Question and Answer (SQA) sites are desired. We find that existing approaches are assessed solely through user studies. Hence, a new user study needs to be performed every time a new approach is introduced; this is time-consuming, slows down the development of the new approach, and results from different user studies may not be comparable to each other. There is a need for a benchmark with ground truth summaries to complement assessment through user studies. Unfortunately, such a benchmark is non-existent for answer summarization for technical queries from SQA sites.

To fill the gap, we manually construct a high-quality benchmark to enable automatic evaluation of answer summarization for technical queries for SQA sites. It contains 111 query-summary pairs extracted from 382 Stack Overflow answers with 2,014 sentence candidates. Using the benchmark, we comprehensively evaluate the performance of existing approaches and find that there is still a big room for improvements.

Motivated by the results, we propose a new approach TechSumBot with three key modules:1) Usefulness Ranking module, 2) Centrality Estimation module, and 3) Redundancy Removal module. We evaluate TechSumBot in both automatic (i.e., using our benchmark) and manual (i.e., via a user study) manners. The results from both evaluations consistently demonstrate that TechSumBot outperforms the best performing baseline approaches from both SE and NLP domains by a large margin, i.e., 10.83%–14.90%, 32.75%–36.59%, and 12.61%–17.54%, in terms of ROUGE-1, ROUGE-2, and ROUGE-L on automatic evaluation, and 5.79%–9.23% and 17.03%–17.68%, in terms of average usefulness and diversity score on human evaluation. This highlights that the automatic evaluation of our benchmark can uncover findings similar to the ones found through user studies. More importantly, automatic evaluation has a much lower cost, especially when it is used to assess a new approach. Additionally, we also conducted an ablation study, which demonstrates that each module in TechSumBot contributes to boosting the overall performance of TechSumBot. We release the benchmark as well as the replication package of our experiment at https://anonymous.4open.science/r/TECHSUMBOT.

Thu 13 Oct

Displayed time zone: Eastern Time (US & Canada) change

10:00 - 12:00
Technical Session 22 - Code Summarization and RecommendationResearch Papers / NIER Track / Journal-first Papers / Industry Showcase at Banquet A
Chair(s): Houari Sahraoui Université de Montréal
10:00
20m
Research paper
Identifying Solidity Smart Contract API Documentation Errors
Research Papers
Chenguang Zhu The University of Texas at Austin, Ye Liu Nanyang Technological University, Xiuheng Wu Nanyang Technological University, Singapore, Yi Li Nanyang Technological University
Pre-print
10:20
10m
Vision and Emerging Results
Few-shot training LLMs for project-specific code-summarization
NIER Track
Toufique Ahmed University of California at Davis, Prem Devanbu Department of Computer Science, University of California, Davis
DOI Pre-print
10:30
20m
Research paper
Answer Summarization for Technical Queries: Benchmark and New Approach
Research Papers
Chengran Yang Singapore Management University, Bowen Xu School of Information Systems, Singapore Management University, Ferdian Thung Singapore Management University, Yucen Shi Singapore Management University, Ting Zhang Singapore Management University, Zhou Yang Singapore Management University, Xin Zhou , Jieke Shi Singapore Management University, Junda He Singapore Management University, DongGyun Han Royal Holloway, University of London, David Lo Singapore Management University
10:50
20m
Paper
Code Structure Guided Transformer for Source Code SummarizationVirtual
Journal-first Papers
Shuzheng Gao Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Yulan He University of Warwick, Jichuan Zeng The Chinese University of Hong Kong, Lun Yiu Nie Tsinghua University, Xin Xia Huawei Software Engineering Application Technology Lab, Michael Lyu The Chinese University of Hong Kong
11:10
10m
Vision and Emerging Results
Taming Multi-Output Recommenders for Software EngineeringVirtual
NIER Track
Christoph Treude University of Melbourne
11:20
20m
Industry talk
MV-HAN: A Hybrid Attentive Networks based Multi-View Learning Model for Large-scale Contents RecommendationVirtual
Industry Showcase
Ge Fan Tencent Inc., Chaoyun Zhang Tencent Inc., Kai Wang Tencent Inc., Junyang Chen Shenzhen University
DOI Pre-print
11:40
20m
Research paper
Which Exception Shall We Throw?Virtual
Research Papers
Hao Zhong Shanghai Jiao Tong University