Can Test Generation and Program Repair Inform Automated Assessment of Programming Projects?
Computer Science educators assessing student programming assignments are typically responsible for two challenging tasks: grading and providing feedback. Producing grades that are fair and feedback that is useful to students is a goal common to most educators. In this context, automated test generation and program repair offer promising solutions for detecting bugs and suggesting corrections in students’ code which could be leveraged to inform grading and feedback generation. Previous research on the applicability of these techniques to simple programming tasks (e.g., single-method algorithms) has shown promising results, but their effectiveness for more complex programming tasks remains unexplored. To fill this gap, this paper investigates the feasibility of applying existing test generation and program repair tools for assessing complex programming assignment projects. In a case study using a real-world Java programming assignment project with 296 incorrect student submissions, we found that generated tests were insufficient in detecting bugs in over 50% of cases, while full repairs could only be automatically generated for only 2.1% of submissions. Our findings indicate significant limitations in current tools for detecting bugs and repairing student submissions, highlighting the need for more advanced techniques to support automated assessment of complex assignment projects.