Search-based Inference of Class Invariants: How far can Simulated Annealing take us?
Many formal verification and software testing techniques rely on repOK routines to verify the consistency and validity of software components with complex data representations. For a given class under analysis, a repOK encodes its so-called \emph{class invariant}, i.e., the state properties required on an instance for it to be considered a valid object of the class. While repOK routines can enable early error detection and simplify debugging, writing correct and complete repOKs can be challenging, even for advanced Large Language Models (LLMs). This paper explores the effectiveness of search-based techniques to automatically generate class invariants in the form of imperative repOKs.
We perform an evaluation using Express, a recent simulated annealing-based framework that automatically generates repOKs, on a set of 7 Java classes from widely-used libraries, and show that the framework can generate \emph{correct repOKs for all cases}, ensuring no valid instances are erroneously discarded. Furthermore, Express produces \emph{complete} repOKs in 3 out of 7 cases. Combining Express with an LLM further enhances effectiveness, generating \emph{correct} and \emph{complete} repOK routines for 5 out of 7 subjects. In addition, Express yields more complete invariants (i.e., it generates stronger properties) than the best related approach in 5 out of 7 cases. These results highlight the potential of search-based techniques for automating consistency checks in software systems, bridging gaps in current manual and automated software analyses.