Answer Summarization for Technical Queries: Benchmark and New Approach
Prior studies have demonstrated that approaches to generate an answer summary for a given technical query in Software Question and Answer (SQA) sites are desired. We find that existing approaches are assessed solely through user studies. Hence, a new user study needs to be performed every time a new approach is introduced; this is time-consuming, slows down the development of the new approach, and results from different user studies may not be comparable to each other. There is a need for a benchmark with ground truth summaries to complement assessment through user studies. Unfortunately, such a benchmark is non-existent for answer summarization for technical queries from SQA sites.
To fill the gap, we manually construct a high-quality benchmark to enable automatic evaluation of answer summarization for technical queries for SQA sites. It contains 111 query-summary pairs extracted from 382 Stack Overflow answers with 2,014 sentence candidates. Using the benchmark, we comprehensively evaluate the performance of existing approaches and find that there is still a big room for improvements.
Motivated by the results, we propose a new approach TechSumBot with three key modules:1) Usefulness Ranking module, 2) Centrality Estimation module, and 3) Redundancy Removal module. We evaluate TechSumBot in both automatic (i.e., using our benchmark) and manual (i.e., via a user study) manners. The results from both evaluations consistently demonstrate that TechSumBot outperforms the best performing baseline approaches from both SE and NLP domains by a large margin, i.e., 10.83%–14.90%, 32.75%–36.59%, and 12.61%–17.54%, in terms of ROUGE-1, ROUGE-2, and ROUGE-L on automatic evaluation, and 5.79%–9.23% and 17.03%–17.68%, in terms of average usefulness and diversity score on human evaluation. This highlights that the automatic evaluation of our benchmark can uncover findings similar to the ones found through user studies. More importantly, automatic evaluation has a much lower cost, especially when it is used to assess a new approach. Additionally, we also conducted an ablation study, which demonstrates that each module in TechSumBot contributes to boosting the overall performance of TechSumBot. We release the benchmark as well as the replication package of our experiment at https://anonymous.4open.science/r/TECHSUMBOT.
Thu 13 OctDisplayed time zone: Eastern Time (US & Canada) change
10:00 - 12:00 | Technical Session 22 - Code Summarization and RecommendationResearch Papers / NIER Track / Journal-first Papers / Industry Showcase at Banquet A Chair(s): Houari Sahraoui Université de Montréal | ||
10:00 20mResearch paper | Identifying Solidity Smart Contract API Documentation Errors Research Papers Chenguang Zhu The University of Texas at Austin, Ye Liu Nanyang Technological University, Xiuheng Wu Nanyang Technological University, Singapore, Yi Li Nanyang Technological University Pre-print | ||
10:20 10mVision and Emerging Results | Few-shot training LLMs for project-specific code-summarization NIER Track Toufique Ahmed University of California at Davis, Prem Devanbu Department of Computer Science, University of California, Davis DOI Pre-print | ||
10:30 20mResearch paper | Answer Summarization for Technical Queries: Benchmark and New Approach Research Papers Chengran Yang Singapore Management University, Bowen Xu School of Information Systems, Singapore Management University, Ferdian Thung Singapore Management University, Yucen Shi Singapore Management University, Ting Zhang Singapore Management University, Zhou Yang Singapore Management University, Xin Zhou , Jieke Shi Singapore Management University, Junda He Singapore Management University, DongGyun Han Royal Holloway, University of London, David Lo Singapore Management University | ||
10:50 20mPaper | Code Structure Guided Transformer for Source Code SummarizationVirtual Journal-first Papers Shuzheng Gao Harbin Institute of Technology, Cuiyun Gao Harbin Institute of Technology, Yulan He University of Warwick, Jichuan Zeng The Chinese University of Hong Kong, Lun Yiu Nie Tsinghua University, Xin Xia Huawei Software Engineering Application Technology Lab, Michael Lyu The Chinese University of Hong Kong | ||
11:10 10mVision and Emerging Results | Taming Multi-Output Recommenders for Software EngineeringVirtual NIER Track Christoph Treude University of Melbourne | ||
11:20 20mIndustry talk | MV-HAN: A Hybrid Attentive Networks based Multi-View Learning Model for Large-scale Contents RecommendationVirtual Industry Showcase Ge Fan Tencent Inc., Chaoyun Zhang Tencent Inc., Kai Wang Tencent Inc., Junyang Chen Shenzhen University DOI Pre-print | ||
11:40 20mResearch paper | Which Exception Shall We Throw?Virtual Research Papers Hao Zhong Shanghai Jiao Tong University |