Preprocessing is What You Need: Understanding and Predicting the Complexity of SAT-based Uniform Random Sampling
Despite its NP-completeness, the Boolean satisfiability problem gave birth to highly efficient tools able to find solutions to a Boolean formula and compute their number. Boolean formulae compactly encode huge, constrained search spaces for variability-intensive systems, e.g., the possible configurations of the Linux kernel. These search spaces are generally too big to explore exhaustively, leading most testing approaches to sample a few solutions before analysing them. A desirable property of such samples is \textit{uniformity}: each solution should get the same selection probability. This property motivated the design of uniform random samplers, relying on SAT solvers and counters and achieving different tradeoffs between uniformity and scalability. Though we can observe their performance in practice, understanding the complexity these tools face and accurately predicting it is an under-explored problem. Indeed, structural metrics such as the number of variables and clauses involved in a formula poorly predict the sampling complexity. More elaborated ones, such as minimal independent support (MIS), are intractable to compute on large formulae. We provide an efficient parallel algorithm to compute a related metric, the \textit{number of equivalence classes}, and demonstrate that this metric is highly correlated to time and memory usage of uniform random sampling and model counting tools. We explore the role of formula preprocessing on various metrics and show its positive influence on correlations. Relying on these correlations, we train an efficient classifier (F1-score 0.97) to predict whether uniformly sampling a given formula will exceed a specified budget. Our results allow us to characterise the similarities and differences between (uniform) sampling, solving and counting.
Sun 14 AprDisplayed time zone: Lisbon change
11:00 - 12:30 | Automata and applicationsFormaliSE 2024 at Eugénio de Andrade Chair(s): Domenico Bianculli University of Luxembourg | ||
11:00 30mTalk | Contract Automata: A Specification Language for Mode-Based Systems FormaliSE 2024 Alexander Weigl Karlsruhe Institute for Technology, Joshua Bachmeier FZI Forschungszentrum Informatik, Bernhard Beckert Karlsruhe Institute of Technology, Mattias Ulbrich Karlsruhe Institute of Technology | ||
11:30 30mTalk | Finite Automata synthesis from Interactions FormaliSE 2024 Erwan Mahe Université Paris-Saclay, CEA, List, Boutheina Bannour Université Paris-Saclay, CEA, List, Christophe Gaston Université Paris-Saclay, CEA, List, Arnault Lapitre Université Paris-Saclay, CEA, List, Pascale Le Gall CentraleSupelec | ||
12:00 30mTalk | Preprocessing is What You Need: Understanding and Predicting the Complexity of SAT-based Uniform Random Sampling FormaliSE 2024 Olivier Zeyen University of Luxembourg, SnT, Maxime Cordy University of Luxembourg, Luxembourg, Gilles Perrouin Fonds de la Recherche Scientifique - FNRS & University of Namur, Mathieu Acher University of Rennes, France / Inria, France / CNRS, France / IRISA, France |