Improving API Knowledge Comprehensibility: A Context-Dependent Entity Detection and Context Completion Approach using LLM
Extracting API knowledge from Stack Overflow has become a crucial way to assist developers in using APIs. Existing research has primarily focused on extracting relevant API-related knowledge at the sentence level to enhance API documentation.
However, this level of extraction can lead to a loss of crucial context, especially when sentences contain context-dependent entities (i.e., whose understanding requires reference to the surrounding context) that may hinder developers’ understanding. To investigate this issue, we conducted an empirical study of 384 Stack Overflow posts and found that (1) approximately one-third of API functionality sentences contain context-dependent entities, and (2) these entities fall into two categories: Referential Context-Dependent Entities and Local Variable Context-Dependent Entities. In response, we developed a novel method, CEDCC, which combines an entity filtering strategy informed by insights from our empirical study, with a large language model (LLM) to construct coreference chains for detecting context-dependent entities. Additionally, it employs a step-by-step approach with the LLM to complete the necessary context for understanding these entities. To evaluate CEDCC, we constructed a dataset of 1,023 API knowledge sentences, including 567 context-dependent entities and their required contexts. The results demonstrate the effectiveness of CEDCC in accurately detecting context-dependent entities and completing context tasks, achieving an F1-score of 0.865 and a BERTScore of 0.373, significantly surpassing the baseline methods. Human evaluations further confirmed that CEDCC effectively improves the comprehensibility of API knowledge sentences.
Wed 5 MarDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | API and Dependency Analysis (Room: L-1720)Research Papers at L-1720 Chair(s): Raula Gaikovina Kula Osaka University | ||
14:00 15mTalk | Analysing Software Supply Chains of Infrastructure as Code: Extraction of Ansible Plugin Dependencies Research Papers Ruben Opdebeeck Vrije Universiteit Brussel, Bram Adams Queen's University, Coen De Roover Vrije Universiteit Brussel Pre-print | ||
14:15 15mTalk | Enhancing Automated Vulnerability Repair through Dependency Embedding and Pattern Store Research Papers Qingao Dong Beihang university, Yuanzhang Lin Beihang University, Xiang Gao Beihang University, Hailong Sun Beihang University | ||
14:30 15mTalk | Improving API Knowledge Comprehensibility: A Context-Dependent Entity Detection and Context Completion Approach using LLM Research Papers Zhang Zhang National University of Defense Technology, Xinjun Mao National University of Defense Technology, Shangwen Wang National University of Defense Technology, Kang Yang National University of Defense Technology, Tanghaoran Zhang National University of Defense Technology, Fei Gao National University of Defense Technology, Xunhui Zhang National University of Defense Technology, China | ||
14:45 15mTalk | Pay Your Attention on Lib! Android Third-Party Library Detection via Feature Language Model Research Papers Dahan Pan Shanghai Jiao Tong University, Yi Xu Shanghai Jiao Tong University, Runhan Feng Shanghai Jiao Tong University, Donghui Yu Shanghai Jiao Tong University, Jiawen Chen Shanghai Jiao Tong University, Ya Fang Shanghai Jiao Tong University, Yuanyuan Zhang Shanghai Jiao Tong University | ||
15:00 15mTalk | THINK: Tackling API Hallucinations in LLMs via Injecting Knowledge Research Papers Jiaxin Liu National University of Defense Technology, Yating Zhang National University of Defense Technology, Deze Wang National University of Defense Technology, Yiwei Li National University of Defense Technology, Wei Dong National University of Defense Technology |