SIExVulTS: Sensitive Information Exposure Vulnerability Detection System using Transformer Models and Static Analysis (ESEIW 2025 - ESEM - Technical Track)

Sun 28 September - Fri 3 October 2025

Who

Kyler Katz, Sara Moshtari, Ibrahim Mujhid, Mehdi Mirakhorli

Track

ESEIW 2025 ESEM - Technical Track

Abstract

Sensitive Information Exposure (SIEx) vulnerabilities (CWE-200) remain a persistent and under-addressed threat across software systems, often leading to serious security breaches. Existing detection tools rarely target the diverse subcategories of CWE-200 or provide context-aware analysis of code-level data flows. In this paper, we present SIExVuTS, a novel vulnerability detection system that integrates transformer-based models with static analysis to identify and verify sensitive information exposure in Java applications. SIExVuTS employs a three-stage architecture: (1) an Attack Surface Detection Engine that uses sentence embeddings to identify sensitive variables, strings, comments, and sinks with an average F1 score greater than 93%; (2) an Exposure Analysis Engine that instantiates CodeQL queries aligned with the CWE-200 hierarchy to achieve an F1 score of 85.71%; and (3) a Flow Verification Engine that leverages GraphCodeBERT to semantically validate source-to-sink flows to increase the precision from 22.61% to 87.23%. We evaluate SIExVuTS across three curated datasets, including real-world CVEs, a benchmark set of synthetic CWE-200 examples, and labeled flows from 31 open-source projects. Moreover, SIExVuTS successfully uncovered three previously unknown CVEs in major Apache projects. These results demonstrate its effectiveness and practical applicability for improving software security against sensitive data exposure.

Kyler Katz

University of Hawaii at Manoa