AI-based automated grading of source code of introductory programming assignments (ICPC 2025 - Research Track)

Who

Jayant Havare, Varsha Apte, Kaushikraj Maharajan, Nithin Chandra Gupta Samudrala, Ganesh Ramakrishnan, Srikanth Tamilselvam, Sainath Vavilapalli

Track

ICPC 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 27 Apr 2025 14:20 - 14:30 at 205 - Education, Debugging, Dynamic Analysis Chair(s): Simone Scalabrino, Coen De Roover, Gema Rodríguez-Pérez

Abstract

In a typical introductory programming course, grading of student submitted programs typically involves an autograder that tests functionality of the programs with predefined testcases. However, in an educational setting, it is desirable for the educator to examine the source code prior to assigning the final grade for reasons such as checking for compliance to some criteria (e.g. ‘Use iteration, not recursion’, or ‘do not use additional arrays’). A rubric is often used by graders to grade according such criteria. However, manual grading of source code can be labor-intensive and impractical for large-scale online courses. Therefore, in this paper, we propose techniques based on Large Language Models for code to automatically grade student programs according to instructor-specified rubrics. Leveraging a graded dataset comprising 27 problems, each with approximately 140 submissions, to be graded over 210 criteria, we develop methodologies within the frameworks of Zero-Shot prompting, Few-Shot prompting, Supervised Fine Tuning, and Direct Preference Optimization (DPO), and compare their effectiveness on nine different LLMs for code. We also investigate approaches of code scrambling and code augmentation to make our models more robust. Our findings indicate that fine-tuning can result in 10-15% improvement in accuracy and F1-scores for smaller models such as Phi-3B, however larger models such as Codestral 22B gave accuracy of 86% and F1 score of 81% with and without fine-tuning, even for new, unique criteria specific to new problem statements, which were not a part of the fine-tuning. With these results, we believe we have a promising methodology and sufficient knowledge of SOTA model for actual deployment in autograder applications used in courses with large enrollments.

Jayant Havare

Indian Institute of technology - Bombay

Varsha Apte

Indian Institute of technology - Bombay

Kaushikraj Maharajan

Indian Institute of technology - Bombay

Nithin Chandra Gupta Samudrala

Indian Institute of technology - Bombay

Ganesh Ramakrishnan

Indian Institute of technology - Bombay

Srikanth Tamilselvam

IBM Research

Sainath Vavilapalli

Indian Institute of Technology - Bombay

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 27 Apr
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Education, Debugging, Dynamic AnalysisResearch Track / Early Research Achievements (ERA) / Replications and Negative Results (RENE) / Tool Demonstration at 205 Chair(s): Simone Scalabrino University of Molise, Coen De Roover Vrije Universiteit Brussel, Gema Rodríguez-Pérez Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus

14:00 10m Talk		JavaWiz: A Trace-Based Graphical Debugger for Software Development Education Research Track Markus Weninger JKU Linz, Simon Grünbacher Institute for System Software; Johannes Kepler University Linz, Austria, Herbert Prähofer Johannes Kepler University Linz Pre-print
14:10 10m Talk		Pinpointing the Learning Obstacles of an Interactive Theorem Prover Research Track Sára Juhošová Delft University of Technology, Andy Zaidman TU Delft, Jesper Cockx Delft University of Technology Pre-print
14:20 10m Talk		AI-based automated grading of source code of introductory programming assignments Research Track Jayant Havare Indian Institute of technology - Bombay, Varsha Apte Indian Institute of technology - Bombay, Kaushikraj Maharajan Indian Institute of technology - Bombay, Nithin Chandra Gupta Samudrala Indian Institute of technology - Bombay, Ganesh Ramakrishnan Indian Institute of technology - Bombay, Srikanth Tamilselvam IBM Research, Sainath Vavilapalli Indian Institute of Technology - Bombay
14:30 10m Talk		An Analysis of Students' Program Comprehension Processes in a Large Code Base Research Track Anshul Shah University of California, San Diego, Thanh Tong University of California, San Diego, Elena Tomson University of California, San Diego, Steven Shi University of California, San Diego, William G. Griswold University of California San Diego, Gerald Soosairaj University of California, San Diego
14:40 6m Talk		OVERLORD: A C++ overloading inspector Tool Demonstration Botond Horváth ELTE Eötvös Loránd University, Budapest, Hungary, Richárd Szalay Eötvös Loránd University, Faculty of Informatics, Department of Programming Languages and Compilers, Zoltán Porkoláb ELTE Eötvös Loránd University, Budapest, Hungary Pre-print
14:46 6m Talk		Optimizing Code Runtime Performance through Context-Aware Retrieval-Augmented Generation Early Research Achievements (ERA) Manish Acharya Vanderbilt University, Yifan Zhang Vanderbilt University, Kevin Leach Vanderbilt University, Yu Huang Vanderbilt University
14:52 6m Talk		Investigating Execution-Aware Language Models for Code Optimization Replications and Negative Results (RENE) Federico Di Menna University of L'Aquila, Luca Traini University of L'Aquila, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Vittorio Cortellessa University of L'Aquila Pre-print
14:58 6m Talk		Understanding Data Access in Microservices Applications Using Interactive Treemaps Early Research Achievements (ERA) Maxime ANDRÉ Namur Digital Institute, University of Namur, Marco Raglianti Software Institute - USI, Lugano, Anthony Cleve University of Namur, Michele Lanza Software Institute - USI, Lugano Pre-print
15:04 6m Talk		Divergence-Driven Debugging: Understanding Behavioral Changes Between Two Program Versions Early Research Achievements (ERA) Rémi Dufloer Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France, Imen Sayar Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France, Anne Etien University of Lille, Lille, France, Steven Costiou INRIA Lille
15:10 10m Talk		Effectively Modeling UI Transition Graphs for Android Apps via Reinforcement Learning Research Track Wunan Guo School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Zhen Dong Fudan University, Liwei Shen Fudan University, Daihong Zhou School of Computer Science and Information Engineering, Shanghai Institute of Technology, Bin Hu Fudan University, Chen Zhang Fudan University, Hai Xue University of Shanghai for Science and Technology
15:20 10m Live Q&A		Session's Discussion: "Education, Debugging, Dynamic Analysis" Research Track