"No Free Lunch" when using Large Language Models to Verify Self-Generated Programs
Large Language Models (LLMs) have shown great success in a wide range of text-generation tasks including the synthesis of code from natural language descriptions. As LLM-based techniques continue to grow in popularity, especially amongst entry-level developers, LLM-generated code has the potential to be deployed in a diverse set of application domains. While LLMs can generate syntactically correct code output, recent work has shown the presence of nonsensical and faulty reasoning in LLM-generated text. As such, overreliance on LLMs for software generation may potentially result in the deployment of faulty software leading to critical system failures. This study explores the capabilities of a single LLM to generate both software and corresponding test suites from the same initial program descriptions, which can be considered analogous to an individual developer coding and unit testing for a given piece of software. We present an empirical framework and evaluation methodology to assess the usefulness of LLM-generated test cases for verifying programs generated by the same LLM. Our findings indicate that LLMs frequently generate irrelevant tests that suffer from numerous quality concerns.
Tue 28 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | |||
11:00 22mTalk | "No Free Lunch" when using Large Language Models to Verify Self-Generated Programs AIST | ||
11:22 22mTalk | An End-to-End Test Case Prioritization Framework using Optimized Machine Learning Models AIST Md Asif Khan Ontario Tech University, Akramul Azim Ontario Tech University, Ramiro Liscano Ontario Tech University, Kevin Smith International Business Machines Corporation (IBM), Yee-Kang Chang International Business Machines Corporation (IBM), Qasim Tauseef International Business Machines Corporation (IBM), Gkerta Seferi International Business Machines Corporation (IBM) | ||
11:45 22mTalk | Iterative Optimization of Hyperparameter-based Metamorphic Transformations AIST Gaadha Sudheerbabu Åbo Akademi University, Tanwir Ahmad Åbo Akademi University, Dragos Truscan Åbo Akademi University, Jüri Vain Tallinn University of Technology, Estonia, Ivan Porres Åbo Akademi University | ||
12:07 22mTalk | Machine Learning for Cross-Vulnerability Prediction in Smart Contracts AIST |