Function Clustering-Based Fuzzing Termination: Toward Smarter Early Stopping
Fuzzing is a testing technique that generates a large number of inputs to cause program crashes. As large language models grow, so do the programs developed with their assistance, leading to an exponential increase in code complexity and function counts. Performing comprehensive fuzz testing on all functions has become increasingly challenging and resource-intensive. Current methods for determining when to stop fuzz testing activities rely on metrics such as function coverage, vulnerability function coverage or crash count. However, these metrics fail to account for the scale of the functions under test. For example, function coverage may lead to excessive testing on non-critical functions, while vulnerability function coverage can result in premature termination if the estimated number of vulnerability functions is too low.
This paper introduces a novel fuzzing testing termination criterion based on function clustering. We compare our criterion with three existing methods.Fisrt, by leveraging langurage model for function encoding and a multi-metric fusion algorithm for determining the number of clusters, we establish a relationship between function clustering and vulnerability distribution. Second, our experiments on eight function libraries demonstrate that the proposed termination criterion significantly improves testing efficiency, reducing fuzzing time by 1.4–7.2 hours (5–30%) across different configurations while maintaining minimal bug loss (averaging 0.25 bugs), outperforming existing criteria like vulnerability function coverage-based approaches.