AI-based automated grading of source code of introductory programming assignments
This program is tentative and subject to change.
In a typical introductory programming course, grading of student submitted programs typically involves an autograder that tests functionality of the programs with predefined testcases. However, in an educational setting, it is desirable for the educator to examine the source code prior to assigning the final grade for reasons such as checking for compliance to some criteria (e.g. ‘Use iteration, not recursion’, or ‘do not use additional arrays’). A rubric is often used by graders to grade according such criteria. However, manual grading of source code can be labor-intensive and impractical for large-scale online courses. Therefore, in this paper, we propose techniques based on Large Language Models for code to automatically grade student programs according to instructor-specified rubrics. Leveraging a graded dataset comprising 27 problems, each with approximately 140 submissions, to be graded over 210 criteria, we develop methodologies within the frameworks of Zero-Shot prompting, Few-Shot prompting, Supervised Fine Tuning, and Direct Preference Optimization (DPO), and compare their effectiveness on nine different LLMs for code. We also investigate approaches of code scrambling and code augmentation to make our models more robust. Our findings indicate that fine-tuning can result in 10-15% improvement in accuracy and F1-scores for smaller models such as Phi-3B, however larger models such as Codestral 22B gave accuracy of 86% and F1 score of 81% with and without fine-tuning, even for new, unique criteria specific to new problem statements, which were not a part of the fine-tuning. With these results, we believe we have a promising methodology and sufficient knowledge of SOTA model for actual deployment in autograder applications used in courses with large enrollments.
This program is tentative and subject to change.
Sun 27 AprDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | Education, Debugging, Dynamic AnalysisResearch Track / Early Research Achievements (ERA) / Replications and Negative Results (RENE) / Tool Demonstration at 205 | ||
14:00 10mTalk | JavaWiz: A Trace-Based Graphical Debugger for Software Development Education Research Track Markus Weninger JKU Linz, Simon Grünbacher Institute for System Software; Johannes Kepler University Linz, Austria, Herbert Prähofer Johannes Kepler University Linz | ||
14:10 10mTalk | Pinpointing the Learning Obstacles of an Interactive Theorem Prover Research Track Sára Juhošová Delft University of Technology, Andy Zaidman Delft University of Technology, Jesper Cockx Delft University of Technology Pre-print | ||
14:20 10mTalk | AI-based automated grading of source code of introductory programming assignments Research Track Jayant Havare Indian Institute of technology - Bombay, Varsha Apte Indian Institute of technology - Bombay, Kaushikraj Maharajan Indian Institute of technology - Bombay, Nithin Chandra Gupta Samudrala Indian Institute of technology - Bombay, Ganesh Ramakrishnan Indian Institute of technology - Bombay, Srikanth Tamilselvam IBM Research, Sainath Vavilapalli Indian Institute of Technology - Bombay | ||
14:30 10mTalk | An Analysis of Students' Program Comprehension Processes in a Large Code Base Research Track Anshul Shah University of California, San Diego, Thanh Tong University of California, San Diego, Elena Tomson University of California, San Diego, Steven Shi University of California, San Diego, William G. Griswold University of California San Diego, Gerald Soosairaj University of California, San Diego | ||
14:40 6mTalk | OVERLORD: A C++ overloading inspector Tool Demonstration Botond Horváth ELTE Eötvös Loránd University, Budapest, Hungary, Richárd Szalay Eötvös Loránd University, Faculty of Informatics, Department of Programming Languages and Compilers, Zoltán Porkoláb ELTE Eötvös Loránd University, Budapest, Hungary | ||
14:46 6mTalk | Optimizing Code Runtime Performance through Context-Aware Retrieval-Augmented Generation Early Research Achievements (ERA) Manish Acharya Vanderbilt University, Yifan Zhang Vanderbilt University, Kevin Leach Vanderbilt University, Yu Huang Vanderbilt University | ||
14:52 6mTalk | Investigating Execution-Aware Language Models for Code Optimization Replications and Negative Results (RENE) Federico Di Menna University of L'Aquila, Luca Traini University of L'Aquila, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Vittorio Cortellessa University of L'Aquila Pre-print | ||
14:58 6mTalk | Understanding Data Access in Microservices Applications Using Interactive Treemaps Early Research Achievements (ERA) Maxime ANDRÉ Namur Digital Institute, University of Namur, Marco Raglianti Software Institute - USI, Lugano, Anthony Cleve University of Namur, Michele Lanza Software Institute - USI, Lugano Pre-print | ||
15:04 6mTalk | Divergence-Driven Debugging: Understanding Behavioral Changes Between Two Program Versions Early Research Achievements (ERA) Rémi Dufloer Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France, Imen Sayar Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France, Anne Etien Université de Lille, CNRS, Inria, Centrale Lille, UMR 9189 –CRIStAL, Steven Costiou INRIA Lille | ||
15:10 10mTalk | KotSuite: Unit Test Generation for Kotlin Programs in Android Applications Research Track Feng Yang Wuhan University, Qi Xin Wuhan University, Zhilei Ren Dalian University of Technology, Jifeng Xuan Wuhan University | ||
15:20 10mLive Q&A | Session's Discussion: "Education, Debugging, Dynamic Analysis" Research Track |