Understanding and Mitigating Hallucinations in Industrial LLM-based Unit Test Generation
Unit testing plays a critical role in ensuring software quality and reliability in large-scale industrial environments. While Large Language Models (LLMs) offer promising automated test generation capabilities, their practical deployment faces significant challenges due to hallucination problems. In this paper, we analyze compilation failures from LLM-generated unit tests in Ant Group’s production systems and identify two fundamental types of hallucinations: extrinsic hallucinations caused by insufficient contextual information and intrinsic hallucinations stemming from model limitations even with adequate context. To address these issues, we propose DEHALL, an automated end-to-end unit test generation tool that systematically mitigates both types of hallucinations through comprehensive context construction and targeted static analysis-based repair. Our approach builds a heterogeneous graph to capture relevant context and employs specialized repair mechanisms for import, field, and method issues. Evaluation on Ant Group’s internal datasets reveals that DEHALL achieves 71.56% line coverage and 67.18% branch coverage, significantly outperforming vanilla LLM approaches. In the public benchmarks, it also shows better performance on coverage and better defect detection capability than previous state-of-the-art approaches. DEHALL has been successfully deployed across multiple business domains at Ant Group, achieving an 81% developer adoption rate with positive user feedback on productivity improvements.