ASTER: Natural and Multi-language Unit Test Generation with LLMs
Award Winner
This program is tentative and subject to change.
Implementing automated unit tests is an important but time consuming activity in software development. To support developers in this task, software engineering research over the past few decades has developed many techniques for automating unit test generation. However, despite this effort, usable tools exist for very few programming languages. Moreover, studies have found that automatically generated tests suffer poor readability and often do not resemble developer-written tests. In this work, we present a rigorous investigation of how large language models (LLMs) can help bridge the gap. We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases. We illustrate how the pipeline can be applied to different programming languages, specifically Java and Python, and to complex software requiring environment mocking. We conducted an empirical study to assess the quality of the generated tests in terms of code coverage and test naturalness—evaluating them on standard as well as enterprise Java applications and a large Python benchmark. Our results demonstrate that LLM-based test generation, when guided by static analysis, can be competitive with, and even outperform, state-of-the-art test-generation techniques in coverage achieved while also producing considerably more natural test cases that developers find easy to read and understand. We also present the results of a user study, conducted with 161 professional developers, that highlights the naturalness characteristics of the tests generated by our approach.
This program is tentative and subject to change.
Fri 2 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | |||
11:00 15mTalk | ASTER: Natural and Multi-language Unit Test Generation with LLMsAward Winner SE In Practice (SEIP) Rangeet Pan IBM Research, Myeongsoo Kim Georgia Institute of Technology, Rahul Krishna IBM Research, Raju Pavuluri IBM T.J. Watson Research Center, Saurabh Sinha IBM Research Pre-print | ||
11:15 15mTalk | Automated Code Review In Practice SE In Practice (SEIP) Umut Cihan Bilkent University, Vahid Haratian Bilkent Univeristy, Arda İçöz Bilkent University, Mert Kaan Gül Beko, Ömercan Devran Beko, Emircan Furkan Bayendur Beko, Baykal Mehmet Ucar Beko, Eray Tüzün Bilkent University | ||
11:30 15mTalk | CI at Scale: Lean, Green, and Fast SE In Practice (SEIP) Dhruva Juloori Uber Technologies, Inc, Zhongpeng Lin Uber Technologies Inc., Matthew Williams Uber Technologies, Inc, Eddy Shin Uber Technologies, Inc, Sonal Mahajan Uber Technologies Inc. | ||
11:45 15mTalk | Moving Faster and Reducing Risk: Using LLMs in Release DeploymentAward Winner SE In Practice (SEIP) Rui Abreu Meta, Vijayaraghavan Murali Meta Platforms Inc., Peter C Rigby Meta / Concordia University, Chandra Sekhar Maddila Meta Platforms, Inc., Weiyan Sun Meta Platforms, Inc., Jun Ge Meta Platforms, Inc., Kaavya Chinniah Meta Platforms, Inc., Audris Mockus The University of Tennessee, Megh Mehta Meta Platforms, Inc., Nachiappan Nagappan Meta Platforms, Inc. | ||
12:00 15mTalk | Prioritizing Large-scale Natural Language Test Cases at OPPO SE In Practice (SEIP) Haoran Xu , Chen Zhi Zhejiang University, Tianyu Xiang Guangdong Oppo Mobile Telecommunications Corp., Ltd., Zixuan Wu Zhejiang University, Gaorong Zhang Zhejiang University, Xinkui Zhao Zhejiang University, Jianwei Yin Zhejiang University, Shuiguang Deng Zhejiang University; Alibaba-Zhejiang University Joint Institute of Frontier Technologies | ||
12:15 15mTalk | Search+LLM-based Testing for ARM Simulators SE In Practice (SEIP) Bobby Bruce University of California at Davis, USA, Aidan Dakhama King's College London, Karine Even-Mendoza King’s College London, William B. Langdon University College London, Hector Menendez King’s College London, Justyna Petke University College London |