Planning to Guide LLM for Code Coverage PredictionFull Paper
Code coverage serves as a crucial metric to assess testing effectiveness, measuring the degree to which a test suite exercises different facets of the code, such as statements, branches, or paths. Despite its significance, coverage profilers necessitate access to the entire codebase, constraining their usefulness in situations where the code is incomplete or execution is not feasible, and even cost-prohibitive. While the utilization of Large Language Models (LLMs) for predicting code coverage has demonstrated success, challenges persist, particularly in achieving high accuracy due to the intricate, expansive space of multiple interdependent execution steps in a program. In this paper, we present CCPlan, a plan-based prompting approach grounded in program semantics, which collaborates with LLMs to enhance code coverage prediction. To address the intricacies of predicting code coverage, CCPlan employs planning by discerning various types of statements in an execution flow. Planning empowers GPT to autonomously generate plans based on guided examples, and then CCPlan prompts the GPT model to predict code coverage (Action) based on the plan it generated (Reasoning). Our experiments evaluating CCPlan demonstrate high accuracy, achieving up to 55% in exact-match and 89% in statement-match. CCPlan performs relatively better than the baselines, achieving up to 33% and 19% relatively higher in those metrics. We also showed that due to highly accurate plans (90%), the GPT model predicts better code coverage. Moreover, we show CCPlan’s utility in correctly predicting the least covered statements as a downstream task.
Sun 14 AprDisplayed time zone: Lisbon change
11:00 - 12:30 | Foundation Models for Software Quality AssuranceResearch Track at Luis de Freitas Branco Chair(s): Matteo Ciniselli Università della Svizzera Italiana | ||
11:00 14mFull-paper | Deep Multiple Assertions GenerationFull Paper Research Track | ||
11:14 14mFull-paper | MeTMaP: Metamorphic Testing for Detecting False Vector Matching Problems in LLM Augmented GenerationFull Paper Research Track Guanyu Wang Beijing University of Posts and Telecommunications, Yuekang Li The University of New South Wales, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Li Tianlin Nanyang Technological University, Guosheng Xu Beijing University of Posts and Telecommunications, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology, Kailong Wang Huazhong University of Science and Technology | ||
11:28 14mFull-paper | Planning to Guide LLM for Code Coverage PredictionFull Paper Research Track Hridya Dhulipala University of Texas at Dallas, Aashish Yadavally University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas | ||
11:42 7mShort-paper | The Emergence of Large Language Models in Static Analysis: A First Look through Micro-BenchmarksNew Idea Paper Research Track Ashwin Prasad Shivarpatna Venkatesh University of Paderborn, Samkutty Sabu University of Paderborn, Amir Mir Delft University of Technology, Sofia Reis Instituto Superior Técnico, U. Lisboa & INESC-ID, Eric Bodden | ||
11:49 14mFull-paper | Reality Bites: Assessing the Realism of Driving Scenarios with Large Language ModelsFull Paper Research Track Jiahui Wu Simula Research Laboratory and University of Oslo, Chengjie Lu Simula Research Laboratory and University of Oslo, Aitor Arrieta Mondragon University, Tao Yue Beihang University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University | ||
12:03 7mShort-paper | Assessing the Impact of GPT-4 Turbo in Generating Defeaters for Assurance CasesNew Idea Paper Research Track Kimya Khakzad Shahandashti York University, Mithila Sivakumar York University, Mohammad Mahdi Mohajer York University, Alvine Boaye Belle York University, Song Wang York University, Timothy Lethbridge University of Ottawa | ||
12:10 20mOther | Discussion Research Track |