Write a Blog >>
ICSE 2023
Sun 14 - Sat 20 May 2023 Melbourne, Australia

It is a common practice for OSS users to leverage security advisories to monitor the newly disclosed OSS vulnerabilities and the patch for vulnerability remediation. However, it is common that the vulnerability fixes are publicly available one week earlier and such a time gap may provide an advantage for attackers to develop exploits. Hence, it is important for OSS users to sense the fix as early as possible so that the vulnerability can be remediated before it is exploited. Due to the vulnerability disclosure policy, vulnerabilities are normally silently fixed, which means the fix should not indicate any vulnerability information. In this case, even if the fix is identified, it is hard for OSS users to understand the vulnerability and further evaluate the impact. Therefore, for better vulnerability early sensing, the identification of silent fixes and the corresponding explanations, e.g., the corresponding common weakness enumeration (CWE) and exploitability rating, are equally important.

However, it is challenging to identify silent fixes and provide explanations due to the limited and diverse data. To tackle the challenge, we propose \textit{CoLeFunDa}, which is a framework consisting of a \textbf{Co}ntrastive \textbf{Le}arner and FunDa, which is a novel approach for \textbf{Fun}ction change \textbf{Da}ta augmentation. FunDa first increases the fix data (i.e., code changes) at the function level with unsupervised and supervised strategies. Then the contrastive learner leverages contrastive learning to effectively train a function change encoder, FCBERT, from diverse fix data. Finally, we leverage FCBERT to further fine-tune three downstream tasks, i.e., automated silent fix identification, CWE category classification, and exploitability rating classification, respectively. Our result shows that \textit{CoLeFunDa} outperforms all the state-of-art baselines in all downstream tasks. We also conduct a survey to verify the effectiveness of \textit{CoLeFunDa} in practical usage. The result shows that \textit{CoLeFunDa} can categorize 62.5% (25 out of 40) CVEs with correct CWE categories within the top 2 recommendations.