Automatic Unit Test Generation for Programming Assignments Using Large Language Models (CSEE&T 2025 - IEEE Conference on Software Engineering Education and Training (CSEE&T))

Who

ZHENG Kaisheng, Yuanyang Shen, Yida Tao

Track

CSEE&T 2025 Software Engineering Education

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 28 Apr 2025 14:40 - 15:00 at 206 - Tools and Testing Chair(s): Robert Chatley

Abstract

Programming knowledge is a crucial aspect of computer science education, and unit testing is commonly employed to automatically assess programming assignments. Instructors and teaching assistants typically invest considerable efforts in writing unit tests, which may still be vulnerable to human oversight and mistakes. In this work, we explored the feasibility of using Large Language Models (LLMs) to automate the assessment of programming assignments. In particular, we proposed two approaches: the plain approach that uses GPT-4o-mini in a vanilla setting, and the augmented approach that integrates additional strategies such as tailored prompts with syntax and semantic constraints, and a feedback mechanism with information on test-effectiveness metrics.

We evaluate the two approaches on six real-world programming assignments from an introductory-level programming course at our university. Compared to the plain approach, the augmented approach improves the usability and effectiveness of the generated unit tests, reducing 85% compilation errors while enhancing the statement coverage and mutation scores by 1.7x and 2.1x, respectively. In addition, the augmented approach also complements human-written tests by covering additional program behaviors. In a case study of 1296 students’ submissions that pass human-written tests, the augmented approach successfully detected new bugs in 13% submissions, with an accuracy of 27%. These results not only demonstrate the potentials of LLMs in generating useful unit tests for programming assignments, but also highlight the strategies that can effectively enhance LLMs’ capabilities to augment human-written tests, offering practical benefits for both educators and students.

ZHENG Kaisheng

Southern University of Science and Technology

Yuanyang Shen

Southern University of Science and Technology

Yida Tao

Southern University of Science and Technology

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Tools and TestingCSEE&T at 206 Chair(s): Robert Chatley Imperial College London

14:00 20m Talk		Structured Analysis of Software Testing Education in Higher Education in Germany CSEE&T Christine Jokisch University of Goettingen, Katharina Schramm University of Goettingen, Sebastian Hobert TH Luebeck, Lars Wilhelmi University of Goettingen, Matthias Schumann University of Goettingen
14:20 20m Talk		Gaps in Software Testing Education: A Survey of Academic Courses in Sweden CSEE&T Ayodele Barrett Mälardalen University, Eduard P. Enoiu Malardalen University, Wasif Afzal Mälardalen University
14:40 20m Talk		Automatic Unit Test Generation for Programming Assignments Using Large Language Models CSEE&T ZHENG Kaisheng Southern University of Science and Technology, Yuanyang Shen Southern University of Science and Technology, Yida Tao Southern University of Science and Technology
15:00 20m Talk		Test Early: An Approach to Introduce Younger Students to Unit Testing with Graphical Comparison in Scratch CSEE&T Herart Dominggus Nurue University of Alabama, Jeff Gray University of Alabama