Enhancing LLMs with Staged Grouping and Dehallucination for Header File Decomposition (ASE 2025 - Research Papers)

Who

Yue Wang, Jiaxuan Sun, Yanzhen Zou, Bing Xie

Track

ASE 2025 Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 18 Nov 2025 14:00 - 14:10 at Grand Hall 3 - Maintenance & Evolution 1

Abstract

God Header Files, large header files included by numerous other code files, present significant challenges for code comprehension and maintenance while also increasing recompilation time. Existing approaches leverage various code similarity metrics to decompose such header files, but these metrics do not always capture the code’s functional essence accurately. Large Language Models (LLMs), with their advanced capabilities in code understanding and generation, offer a promising alternative for producing more effective refactorings. However, LLMs face limitations with lengthy code files due to token restrictions and reduced effectiveness in processing long inputs. Additionally, purely LLM-based solutions often suffer from hallucination, producing incomplete or spurious decomposition results. To address these challenges, we propose HFDecomposer, a hybrid approach that enhances LLMs with staged grouping and dehallucination techniques to effectively decompose header files. Our approach introduces a two-stage grouping framework for lengthy header files: it first groups strongly related code entities using traditional similarity metrics, then feeds group summaries to the LLM for higher-level semantic aggregation. To mitigate LLM hallucinations, we enhance prompts with factual knowledge extracted from static analysis, detect errors in LLM output, and make necessary corrections by reassigning missing entities and resolving cyclic dependencies. Our evaluation on real-world header file decomposition refactorings demonstrates that our method effectively overcomes the limitations of purely LLM-based techniques and outperforms the traditional state-of-the-art approach by 11%, delivering more accurate and reliable decomposition results. Our approach enables LLMs to handle lengthy header files efficiently, significantly reduces hallucinations, and ensures the reliability and practicality of the final decomposition.

Yue Wang

Peking University

Jiaxuan Sun

Peking University

Yanzhen Zou

Peking University

Bing Xie

Peking University

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 18 Nov
Displayed time zone: Seoul change

14:00 - 15:30	Maintenance & Evolution 1Research Papers / Journal-First Track at Grand Hall 3

14:00 10m Talk		Enhancing LLMs with Staged Grouping and Dehallucination for Header File Decomposition Research Papers Yue Wang Peking University, Jiaxuan Sun Peking University, Yanzhen Zou Peking University, Bing Xie Peking University
14:10 10m Research paper		Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution Research Papers Raffi Khatchadourian CUNY Hunter College, Tatiana Castro Vélez University of Puerto Rico, Rio Piedras Campus, Mehdi Bagherzadeh Oakland University, Nan Jia City University of New York (CUNY) Graduate Center, Anita Raja City University of New York (CUNY) Hunter College Pre-print Media Attached
14:20 10m Talk		An Empirical Study of Python Library Migration Using Large Language Models Research Papers Mohayeminul Islam University of Alberta, Ajay Jha North Dakota State University, May Mahmoud New York University Abu Dhabi, Ildar Akhmetov Northeastern University, Sarah Nadi New York University Abu Dhabi
14:30 10m Talk		Measuring the Impact of Predictive Models on the Software Project: A Cost, Service Time, and Risk Evaluation of a Metric-based Defect Severity Prediction Model Journal-First Track Umamaheswara Sharma B National Institute of Technology, Calicut, Ravichandra Sadam National Institute of Technology Warangal
14:40 10m Talk		Demystifying the Evolution of Neural Networks with BOM Analysis: Insights from a Large-Scale Study of 55,997 GitHub Repositories Research Papers xiaoning ren , Yuhang Ye University of Science and Technology of China, Xiongfei Wu University of Luxembourg, Yueming Wu Huazhong University of Science and Technology, Yinxing Xue Institute of AI for Industries, Chinese Academy of Sciences
14:50 10m Talk		Fact-Aligned and Template-Constrained Static Analyzer Rule Enhancement with LLMs Research Papers Zongze Jiang Huazhong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Ge Wen Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology
15:00 10m Talk		MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution Research Papers Yibo Wang Northeastern University, Zhihao Peng Northeastern University, Ying Wang Northeastern University, Zhao Wei Tencent, Hai Yu Northeastern University, China, Zhiliang Zhu Northeastern University, China
15:10 10m Talk		Software Reconfiguration in Robotics Journal-First Track Patrizio Pelliccione Gran Sasso Science Institute, L'Aquila, Italy, Sven Peldszus IT University of Copenhagen, Davide Brugali University of Bergamo, Italy, Daniel Strüber Chalmers \| University of Gothenburg / Radboud University, Thorsten Berger Ruhr University Bochum
15:20 10m Talk		CROSS2OH: Enabling Seamless Porting of C/C++ Software Libraries to OpenHarmony Research Papers Qian Zhang University of California at Riverside, Li Tsz On The Hong Kong University of Science and Technology, Ying Wang Northeastern University, Li Li Beihang University, Shing-Chi Cheung Hong Kong University of Science and Technology