Context: Static Application Security Testing Tools (SASTTs) identify software vulnerabilities to support the security and reliability of software applications. Interestingly, several studies have suggested that alternative solutions may be more effective than SASTTs due to their tendency to generate false alarms, commonly referred to as low Precision.
Aim: We aim to comprehensively evaluate SASTTs, setting a reliable benchmark for assessing and finding gaps in vulnerability identification mechanisms based on SASTTs or alternatives.
Method: Our SASTTs evaluation is based on a controlled, though synthetic, Java codebase, it involves an assessment of 1.5 million test executions, and it features innovative methodological features such as effort-aware accuracy metrics and method-level analysis.
Results: Our findings reveal that SASTTs detect a tiny range of vulnerabilities. In contrast to prevailing wisdom, SASTTs exhibit high Precision while falling short in Recall. Conclusions: The paper suggests that enhancing Recall, alongside expanding the spectrum of detected vulnerability types, should be the primary focus for improving SASTTs or alternative approaches, such as machine learning-based vulnerability identification solutions.