Improving Dynamic Specification Inference with LLM-Generated Counterexamples

Contract assertions, such as preconditions, postconditions, and invariants, play a crucial role in software development, enabling applications such as program verification, test generation, and debugging. Despite their benefits, the adoption of contract assertions is complex, due to the difficulty of manually producing such assertions. Dynamic analysis-based approaches, such as Daikon, can aid in this task by inferring expressive assertions from execution traces. However, a fundamental weakness of these methods is their reliance on the thoroughness of the underlying test suites, used for dynamic analysis. When these test suites are not thorough enough or do not contain sufficiently diverse tests, the inferred assertions are often not generalizable, leading to a high rate of invalid candidates (false positives) that must be manually filtered out.
In this paper, we explore the use of large language models to automatically generate tests that attempt to invalidate generated assertions. Our results show that state-of-the-art LLMs can generate effective counterexamples that help to discard up to 11.68% of invalid assertions inferred by SpecFuzzer. Moreover, when incorporating these LLM-generated counterexamples into the dynamic analysis process, we observe an improvement of up to 7% in precision of the inferred specifications, with respect to the ground-truths gathered from the evaluation benchmarks, without affecting recall.
Wed 20 MayDisplayed time zone: Seoul change
10:30 - 12:00 | Specification Inference & Model CheckingJournal-First Papers / Research Papers at Room 103 Chair(s): Eunkyoung Jee KAIST, South Korea | ||
10:30 25mTalk | GRANDSLAM: Linearly Scalable Model Synthesis Research Papers Alexander Boll University of Bern | ||
10:55 25mTalk | Improving Dynamic Specification Inference with LLM-Generated Counterexamples Research Papers Agustín Balestra University of Rio Cuarto, Argentina, Agustin Nolasco University of Rio Cuarto, Facundo Molina Complutense University of Madrid, Diego Garbervetsky Departamento de Computación, FCEyN, UBA, Renzo Degiovanni Luxembourg Institute of Science and Technology, Nazareno Aguirre University of Rio Cuarto/CONICET, Argentina, and Guangdong Technion-Israel Institute of Technology, China | ||
11:15 25mTalk | Systematic API Testing Through Model Checking and Executable Contracts Research Papers Pre-print | ||
11:40 15mTalk | Simulation-based Safety Assessment of Vehicle Characteristics Variations in Autonomous Driving Systems Journal-First Papers Qi Pan Nanjing University of Aeronautics and Astronautics, Tiexin Wang Nanjing University of Aeronautics and Astronautics, Jianwei Ma Nanjing University of Aeronautics and Astronautics, Paolo Arcaini National Institute of Informatics, Tao Yue Beihang University Link to publication DOI | ||