Kitten: A Simple Yet Effective Baseline for Evaluating LLM-Based Compiler Testing Techniques (ISSTA 2025 - Tool Demonstrations)

Who

Yuanmin Xie, Zhenyang Xu, Yongqiang Tian, Min Zhou, Xintong Zhou, Chengnian Sun

Track

ISSTA 2025 Tool Demonstrations

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 25 Jun 2025 15:15 - 15:30 at Cosmos 3A - LLM-based Testing 1 Chair(s): Qingkai Shi

Abstract

Compiler testing is critical and indispensable to improve the correctness of compilers. Spurred by recent advancements in Large Language Models (LLMs), LLM-based compiler testing techniques such as Fuzz4All, have demonstrated their potential in uncovering real bugs in diverse compilers and reducing the required engineering efforts in designing program generators. Given the continuous evolution of LLMs and the emergence of new LLM-based approaches, establishing robust baselines is crucial for rigorous evaluation and driving future advancements in this promising research direction.

To this end, we introduce Kitten, a mutation-based, language-agnostic program generator. Kitten leverages a corpus of seed programs, analogous to the training set for LLMs, and utilizes the target language’s syntax, akin to the knowledge learned by LLMs. Furthermore, Kitten’s mutation operators can generate diverse test programs, demonstrating a behavior analogous to the ability of LLM inference to generate new code.

Our evaluations demonstrate that, using existing compiler test suites as seed programs, Kitten outperforms Fuzz4All in terms of code coverage and bug detection capabilities. Within 24 hours, Kitten achieved 48.3%, 9.9%, 33.8% higher coverage than Fuzz4All on GCC, LLVM and Rustc, respectively, while also identifying 19.3 bugs in GCC, 20.3 bugs in LLVM and 15.7 in Rustc. Over the course of nine months dedicated to Kitten’s development and testing, we identified a total of 328 across the compilers GCC, LLVM, Rustc, Solc, JerryScript, scalac, and slang, of which 310 have been confirmed or fixed. We strongly believe that Kitten serves as an effective baseline, enabling the identification of limitations within existing LLM-based approaches and consequently driving advancements in this promising research direction.

Yuanmin Xie

Tsinghua University

Zhenyang Xu

University of Waterloo

Canada

Yongqiang Tian

Hong Kong SAR China

Min Zhou

Xintong Zhou

University of Waterloo

Chengnian Sun

University of Waterloo

Canada

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 25 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:30	LLM-based Testing 1Research Papers / Tool Demonstrations at Cosmos 3A Chair(s): Qingkai Shi Nanjing University

14:00 25m Talk		A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing Research Papers ye shang Nanjing University, Quanjun Zhang School of Computer Science and Engineering, Nanjing University of Science and Technology, Chunrong Fang Nanjing University, Siqi Gu Nanjing University, Jianyi Zhou Huawei Cloud Computing Technologies Co., Ltd., Zhenyu Chen Nanjing University DOI
14:25 25m Talk		Validating Network Protocol Parsers with Traceable RFC Document Interpretation Research Papers Mingwei Zheng Purdue University, Danning Xie Purdue University, Qingkai Shi Nanjing University, Chengpeng Wang Purdue University, Xiangyu Zhang Purdue University DOI
14:50 25m Talk		Tratto: A Neuro-Symbolic Approach to Deriving Axiomatic Test Oracles Research Papers Davide Molinelli USI Lugano; Schaffhausen Institute of Technology, Alberto Martin-Lopez Software Institute - USI, Lugano, Elliott Zackrone University of Washington, Beyza Eken Sakarya University, Michael D. Ernst University of Washington, Mauro Pezze Università della Svizzera italiana (USI) and Università degli Studi di Milano Bicocca and CIT Constructor Institute of Technology DOI Pre-print
15:15 15m Demonstration		Kitten: A Simple Yet Effective Baseline for Evaluating LLM-Based Compiler Testing Techniques Tool Demonstrations Yuanmin Xie Tsinghua University, Zhenyang Xu University of Waterloo, Yongqiang Tian , Min Zhou , Xintong Zhou University of Waterloo, Chengnian Sun University of Waterloo

Information for Participants

Wed 25 Jun 2025 14:00 - 15:30 at Cosmos 3A - LLM-based Testing 1 Chair(s): Qingkai Shi

Info for room Cosmos 3A:

Cosmos 3A is the first room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.