Artifact of ICEBAR: Feedback-Driven Iterative Repair of Alloy Specifications
Automated program repair (APR) techniques have shown great success in automatically finding fixes for programs in programming languages such as C or Java. In this work, we focus on repairing formal specifications, in particular for the Alloy specification language. As opposed to most APR tools, our approach to repair Alloy specifications, named ICEBAR, does not use test-based oracles for patch assessment. Instead, ICEBAR relies on the use of property-based oracles, commonly found in Alloy specifications as predicates and assertions. These property-based oracles define stronger conditions for patch assessment, thus reducing the notorious overfitting issue caused by using test-based oracles, typically observed in APR contexts. Moreover, as assertions and predicates are inherent to Alloy, whereas test cases are not, our tool is potentially more appealing to Alloy users than test-based Alloy repair tools.
At a high level, ICEBAR is an iterative, counterexample-based process, that generates and validates repair candidates. ICEBAR receives a faulty Alloy specification with a failing property-based oracle, and uses Alloy’s counterexamples to build tests and feed ARepair, a test-based Alloy repair tool, in order to produce a repair candidate. The candidate is then checked against the property oracle for overfitting: if the candidate passes, a repair has been found; if not, further counterexamples are generated to construct tests and enhance the test suite, and the process is iterated. ICEBAR includes different mechanisms, with different degrees of reliability, to generate counterexamples from failing predicates and assertions.
Our evaluation shows that ICEBAR significantly improves over ARepair, in both reducing overfitting and improving the repair rate. Moreover, ICEBAR shows that iterative refinement allows us to significantly improve a state-of-the-art tool for automated repair of Alloy specifications without any modifications to the tool.