Actionable Warning Is Not Enough: Recommending Valid Actionable Warnings with Weak Supervision
The use of static analysis tools has gained increasing popularity among developers in the last few years. However, the widespread adoption of static analysis tools is hindered by their high false alarm rates (up to 90%). Previous studies have introduced the concept of actionable warnings and built machine-learning method to distinguish actionable warnings from false alarms. However, according to our empirical observation, the current assumption used for actionable warning(s) collection is rather shaky and inaccurate, leading to a large number of invalid actionable warnings. To address this problem, in this study, we build the first large actionable warning dataset by mining 68,274 reversions from Top-500 GitHub C repositories, we then take one step further by assigning each actionable warning a weak label regarding its likelihood of being a real bug. Following that, we propose a two-stage framework called ACWRecommender to automatically identify actionable warnings and recommend the actionable warnings with high probability to be real bugs (AWHB). Our approach warms up the pre-trained model UniXcoder to identify actionable warnings (coarse-grained detection stage) and rerank AWHB to the top by weakly supervised learning (fine-grained reranking stage). Experimental results show that our proposed model outperforms several baselines by a large margin in terms of nDCG and MRR for AWHB recommendation. Moreover, we ran our tool on 10 randomly selected projects and manually checked the top-ranked warnings from 2,197 reported warnings, we reported top-60 recommended warnings to developers, 27 of them were already confirmed by developers as real bugs. Developers can quickly find real bugs among the massive amount of reported warnings, which verifies the practical usage of our tool.