Automatic Unit Test Generation for Programming Assignments Using Large Language Models
Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human oversight and mistakes. In this work, we explored the feasibility of using Large Language Models (LLMs) to automate the assessment of programming assignments. In particular, we proposed two approaches: the plain approach that uses GPT-4o-mini in a vanilla setting, and the augmented approach that integrates additional strategies such as tailored prompts with syntax and semantic constraints, and a feedback mechanism with information on test-effectiveness metrics.
We evaluate the two approaches on six real-world programming assignments from an introductory-level programming course at our university. Compared to the plain approach, the augmented approach improves the usability and effectiveness of the generated unit tests, reducing 85% compilation errors while enhancing the statement coverage and mutation scores by 1.7x and 2.1x, respectively. In addition, the augmented approach also complements human-written tests by covering additional program behaviors. In a case study of 1296 students’ submissions that pass human-written tests, the augmented approach successfully detected new bugs in 13% submissions, with an accuracy of 27%. These results not only demonstrate the potentials of LLMs in generating useful unit tests for programming assignments, but also highlight the strategies that can effectively enhance LLMs’ capabilities to augment human-written tests, offering practical benefits for both educators and students.
Mon 28 AprDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | |||
14:00 20mTalk | Structured Analysis of Software Testing Education in Higher Education in Germany CSEE&T Christine Jokisch University of Goettingen, Katharina Schramm University of Goettingen, Sebastian Hobert TH Luebeck, Lars Wilhelmi University of Goettingen, Matthias Schumann University of Goettingen | ||
14:20 20mTalk | Gaps in Software Testing Education: A Survey of Academic Courses in Sweden CSEE&T Ayodele Barrett Mälardalen University, Eduard P. Enoiu Malardalen University, Wasif Afzal Mälardalen University | ||
14:40 20mTalk | Automatic Unit Test Generation for Programming Assignments Using Large Language Models CSEE&T ZHENG Kaisheng Southern University of Science and Technology, Yuanyang Shen Southern University of Science and Technology, Yida Tao Southern University of Science and Technology | ||
15:00 20mTalk | Test Early: An Approach to Introduce Younger Students to Unit Testing with Graphical Comparison in Scratch CSEE&T |