Unsupervised Labeling and Extraction of Phrase-based Concepts in Vulnerability Descriptions
People usually describe the key characteristics of software vulnerabilities in natural language mixed with domain-specific names and concepts. This textual nature poses a significant challenge for automatic analysis of vulnerabilities. Automatic extraction of key vulnerability aspects is highly desirable but demand significant effort to manually label data for model training. In this paper, we propose an unsupervised approach to label and extract important vulnerability concepts in textural vulnerability descriptions (TVDs). We focus on three types of phrase-based vulnerability concepts (root cause, attack vector and impact) as they are much more difficult to label and extract than name- or number-based entities (i.e., vendor, product and version). Our approach is based on a key observation that same-type of phrases, no matter how they differ in sentence structures and phrase expressions, usually share syntactically similar paths in the sentence paring trees. Therefore, we propose two path representations (absolute paths and relative paths) and use auto-encoder to encode such syntactic similarities. To address the discrete nature of our paths, we enhance traditional Variational Auto-encoder (VAE) with Gumble-Max trick for categorical data distribution, and thus creates a Categorical VAE (CaVAE). In the latent space of absolute and relative paths, we further FIt-TSNE and clustering techniques to generate clusters of same-type of concepts. Our evaluation confirms the effectiveness of our CaVAE for encoding path representations, and the accuracy of vulnerability concepts in the resulting clusters. In a concept classification task, our unsupervisedly labeled vulnerability concepts outperform the two manually labeled datasets from previous work.
Thu 18 NovDisplayed time zone: Hobart change
21:00 - 22:00 | |||
21:00 20mTalk | Learning Domain-Specific Edit Operations from Model Repositories with Frequent Subgraph Mining Research Papers Christof Tinnes Saarland University, Timo Kehrer Humboldt University of Berlin, Mitchell Joblin Siemens AG, Uwe Hohenstein Siemens AG, Andreas Biesdorf Siemens AG, Sven Apel Saarland University | ||
21:20 20mTalk | Unsupervised Labeling and Extraction of Phrase-based Concepts in Vulnerability Descriptions Research Papers Sofonias Yitagesu Tianjin University, Zhenchang Xing Australian National University, Xiaowang Zhang Tianjin University, Zhiyong Feng Tianjin University, Xiaohong Li TianJin University, Linyi Han Tianjin University | ||
21:40 20mTalk | A Compositional Deadlock Detector for Android Java Research Papers James Brotherston , Paul Brunet University College London, Nikos Gorogiannis Facebook, Max Kanovich University College London |