Program analyzers implement complex algorithms and, as any software, can contain bugs. Bugs in their implementation may lead to analyzers being imprecise and failing to verify safe programs, i.e., programs with no reachable error locations; or worse, analyzer bugs may lead to reporting unsound results by verifying unsafe programs, i.e., programs with reachable error locations.
In this paper, we propose a method to detect such bugs by generating constraint-based test oracles for analyzers. We re-purpose and extend Fuzzle, a tool for benchmarking fuzzers, in a tool called Minotaur. Minotaur generates C programs from SMT constraints, and based on the satisfiability of the constraints, derives whether the generated programs are safe or unsafe. For instance, for an unsafe program, an analyzer under test contains a soundness issue if it proves it safe. Using Minotaur, we found 30 unique soundness and precision issues in 11 well-known analyzers that reason about reachability properties.