Keyword Extraction From Specification Documents for Planning Security Mechanisms
Software development companies heavily invest both time and money to provide post-production support to fix security vulnerabilities in their products. Current techniques identify vulnerabilities from source code using static and dynamic analyses. However, this does not help integrate security mechanisms early in the architectural design phase. We develop VDocScan, a technique for predicting vulnerabilities based on specification documents, even before the development stage. We evaluate VDocScan using an extensive dataset of CVE vulnerability reports mapped to over 3600 product documentations. An evaluation of 8 CWE vulnerability pillars shows that even interpretable whitebox classifiers predict vulnerabilities with up to 61.1% precision and 78% recall. Further using strategies to improve the relevance of extracted keywords, addressing class imbalance, segregating products into categories such as Operating Systems, Web applications, and Hardware, and using blackbox ensemble models such as the random forest classifier improves the performance to 96% precision and 91.1% recall. The high precision and recall shows that VDocScan can anticipate vulnerabilities detected in a product’s lifetime ahead of time during the Design phase to incorporate necessary security mechanisms. The performance is consistently high for vulnerabilities with the mode of introduction: architecture and design.