Enhancing LLMs with Staged Grouping and Dehallucination for Header File Decomposition
This program is tentative and subject to change.
God Header Files, large header files included by numerous other code files, present significant challenges for code comprehension and maintenance while also increasing recompilation time. Existing approaches leverage various code similarity metrics to decompose such header files, but these metrics do not always capture the code’s functional essence accurately. Large Language Models (LLMs), with their advanced capabilities in code understanding and generation, offer a promising alternative for producing more effective refactorings. However, LLMs face limitations with lengthy code files due to token restrictions and reduced effectiveness in processing long inputs. Additionally, purely LLM-based solutions often suffer from hallucination, producing incomplete or spurious decomposition results. To address these challenges, we propose HFDecomposer, a hybrid approach that enhances LLMs with staged grouping and dehallucination techniques to effectively decompose header files. Our approach introduces a two-stage grouping framework for lengthy header files: it first groups strongly related code entities using traditional similarity metrics, then feeds group summaries to the LLM for higher-level semantic aggregation. To mitigate LLM hallucinations, we enhance prompts with factual knowledge extracted from static analysis, detect errors in LLM output, and make necessary corrections by reassigning missing entities and resolving cyclic dependencies. Our evaluation on real-world header file decomposition refactorings demonstrates that our method effectively overcomes the limitations of purely LLM-based techniques and outperforms the traditional state-of-the-art approach by 11%, delivering more accurate and reliable decomposition results. Our approach enables LLMs to handle lengthy header files efficiently, significantly reduces hallucinations, and ensures the reliability and practicality of the final decomposition.
This program is tentative and subject to change.
Tue 18 NovDisplayed time zone: Seoul change
14:00 - 15:30 | |||
14:00 10mTalk | Enhancing LLMs with Staged Grouping and Dehallucination for Header File Decomposition Research Papers Yue Wang Peking University, Jiaxuan Sun Peking University, Yanzhen Zou Peking University, Bing Xie Peking University | ||
14:10 10mResearch paper | Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution Research Papers Raffi Khatchadourian CUNY Hunter College, Tatiana Castro Vélez University of Puerto Rico, Rio Piedras Campus, Mehdi Bagherzadeh Oakland University, Nan Jia City University of New York (CUNY) Graduate Center, Anita Raja City University of New York (CUNY) Hunter College Pre-print Media Attached | ||
14:20 10mTalk | An Empirical Study of Python Library Migration Using Large Language Models Research Papers Mohayeminul Islam University of Alberta, Ajay Jha North Dakota State University, May Mahmoud New York University Abu Dhabi, Ildar Akhmetov Northeastern University, Sarah Nadi New York University Abu Dhabi | ||
14:30 10mTalk | Measuring the Impact of Predictive Models on the Software Project: A Cost, Service Time, and Risk Evaluation of a Metric-based Defect Severity Prediction Model Journal-First Track Umamaheswara Sharma B National Institute of Technology, Calicut, Ravichandra Sadam National Institute of Technology Warangal | ||
14:40 10mTalk | Demystifying the Evolution of Neural Networks with BOM Analysis: Insights from a Large-Scale Study of 55,997 GitHub Repositories Research Papers xiaoning ren , Yuhang Ye University of Science and Technology of China, Xiongfei Wu University of Luxembourg, Yueming Wu Huazhong University of Science and Technology, Yinxing Xue Institute of AI for Industries, Chinese Academy of Sciences | ||
14:50 10mTalk | Fact-Aligned and Template-Constrained Static Analyzer Rule Enhancement with LLMs Research Papers Zongze Jiang Huazhong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Ge Wen Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology | ||
15:00 10mTalk | MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution Research Papers Yibo Wang Northeastern University, Zhihao Peng Northeastern University, Ying Wang Northeastern University, Zhao Wei Tencent, Hai Yu Northeastern University, China, Zhiliang Zhu Northeastern University, China | ||
15:10 10mTalk | Software Reconfiguration in Robotics Journal-First Track Patrizio Pelliccione Gran Sasso Science Institute, L'Aquila, Italy, Sven Peldszus IT University of Copenhagen, Davide Brugali University of Bergamo, Italy, Daniel Strüber Chalmers | University of Gothenburg / Radboud University, Thorsten Berger Ruhr University Bochum | ||
15:20 10mTalk | CROSS2OH: Enabling Seamless Porting of C/C++ Software Libraries to OpenHarmony Research Papers Qian Zhang University of California at Riverside, Li Tsz On The Hong Kong University of Science and Technology, Ying Wang Northeastern University, Li Li Beihang University, Shing-Chi Cheung Hong Kong University of Science and Technology | ||