Applying Large Language Models to Enhance the Assessment of Java Programming Assignments (FSE 2025 - Software Engineering Education)

Mon 23 - Fri 27 June 2025 Trondheim, Norway

co-located with ISSTA 2025

Who

Skyler Grandel, Douglas C. Schmidt, Kevin Leach

Track

FSE 2025 Software Engineering Education

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 24 Jun 2025 16:20 - 16:40 at Sirius - Assessment, Review, and Peer Feedback Chair(s): Kathryn Stolee

Abstract

The assessment of programming assignments in computer science (CS) education traditionally relies on manual grading, which strives to provide comprehensive feedback on correctness, style, efficiency, and other software quality attributes. As class sizes increase, however, it is hard to provide detailed feedback consistently, especially when multiple assessors are required to handle a larger number of assignment submissions. Large Language Models (LLMs) such as ChatGPT offer a promising alternative to help automate this process in a consistent, scalable, and fair manner.

This paper explores the efficacy of ChatGPT-4 and other popular LLMs in automating programming assignment evaluation. We conduct a series of studies within multiple Java-based CS courses, comparing LLM-generated assessments to those produced by human graders. The analysis focuses on key metrics, such as accuracy, precision, recall, efficiency, and consistency, to identify programming mistakes based on predefined rubrics. Our findings demonstrate that, with appropriate prompt engineering and feature selection, LLMs improve grading objectivity and efficiency, serving as a valuable complementary tool to human graders in undergraduate and graduate CS education.

Skyler Grandel

Vanderbilt University

United States

Douglas C. Schmidt

Vanderbilt University

Kevin Leach

Vanderbilt University

United States

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 24 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:00 - 18:00	Assessment, Review, and Peer FeedbackSoftware Engineering Education at Sirius Chair(s): Kathryn Stolee North Carolina State University

16:00 20m Talk		An Empirical Study of the Error Characteristics in an Online Judge System Software Engineering Education Shota Shimizu Ritsumeikan University, Erina Makihara Ritsumeikan University, Norihiro Yoshida Ritsumeikan University
16:20 20m Talk		Applying Large Language Models to Enhance the Assessment of Java Programming Assignments Software Engineering Education Skyler Grandel Vanderbilt University, Douglas C. Schmidt Vanderbilt University, Kevin Leach Vanderbilt University
16:40 20m Talk		Direct Automated Feedback Delivery for Student Submissions based on LLMs Software Engineering Education Maximilian Sölch Technical University of Munich, Felix T.J. Dietrich Technical University of Munich, Stephan Krusche Technical University of Munich
17:00 20m Talk		Understanding Comparative Comprehension Barriers for Students during Code Review through Simplification Software Engineering Education Nick Case North Carolina State University, John-Paul Ore North Carolina State University, Kathryn Stolee North Carolina State University
17:20 10m Talk		"Person is a person, a tool is a tool" - ChatGPT’s Role in Student Help-Seeking Behavior and Peer Support Software Engineering Education Sonja Hyrynsalmi LUT University, Micheal Tuape LUT University, Antti Knutas LUT University
17:30 10m Talk		The Impact of Multi-Peer Feedback Summary Organization on Review and Implementation of Feedback Software Engineering Education Somayeh Bayat Esfandani Norwegian University of Science and Technology, Trond Aalberg Norwegian University of Science and Technology

Information for Participants

Tue 24 Jun 2025 16:00 - 18:00 at Sirius - Assessment, Review, and Peer Feedback Chair(s): Kathryn Stolee

Info for room Sirius:

Sirius is located just behind the registration desk.

Facing the registration desk, its entrance is on the right.