FuzzSlice: Pruning False Positives in Static Analysis Warnings through Function-Level Fuzzing
Manual confirmation of static analysis reports is a daunting task. This is due to both the large number of warnings and the high density of false positives among them. Fuzzing techniques have been proposed to verify static analysis warnings. However, a major limitation is that fuzzing the whole project to reach all static analysis warnings is not feasible. This can take several days and exponential machine time to increase code coverage linearly. Therefore, we propose FuzzSlice, a novel framework that automatically prunes possible false positives among static analysis warnings. Unlike prior work that mostly focuses on confirming true positives among static analysis warnings, which inevitably requires end-to-end fuzzing, FuzzSlice focuses on ruling out potential false positives, which are the majority in static analysis reports. The key insight that we base our work on is that a warning that does not yield a crash when fuzzed at the function level in a given time budget is most likely a false positive. To achieve this, FuzzSlice first aims to generate compilable code slices at the function level. Then, FuzzSlice fuzzes these code slices instead of the entire binary to prune possible false positives. FuzzSlice is also unlikely to misclassify a true bug as a false positive because the crashing input can be reproduced by a fuzzer at the function level as well. We evaluate FuzzSlice on the Juliet synthetic dataset and real-world complex C projects: openssl, tmux and openssh-portable. Our evaluation shows that the ground truth in the Juliet dataset had 864 false positives which were all detected by FuzzSlice. For the open-source repositories, we were able to get the developers from two of these open-source repositories to independently label these warnings. FuzzSlice automatically identifies 33 possible false positives out of 53 false positives confirmed by developers in these two repositories. This implies that FuzzSlice can reduce the number of false positives by 62.26% in the open-source repositories and by 100% in the Juliet dataset.