Vulnerability-Triggering Test Case Generation from Third-Party Libraries (FORGE 2025 - Research Papers)

Who

Yi Gao, Xing Hu, Zirui Chen, Tongtong Xu, Xiaohu Yang

Track

FORGE 2025 Research Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 27 Apr 2025 16:24 - 16:36 at 207 - Session2: FM for Software Quality Assurance & Testing Chair(s): Feifei Niu

Abstract

Open-source third-party libraries are widely used in software development. These libraries offer substantial advantages in terms of time and resource savings. However, a significant concern arises due to the publicly disclosed vulnerabilities within these libraries. Existing automated vulnerability detection tools often suffer from false positives and fail to accurately assess the propagation of inputs capable of triggering vulnerabilities from client projects to vulnerable code in libraries. In this paper, we propose a novel approach called VulEUT (Vulnerability Exploit Unit Test Generation), which combines vulnerability exploitation reachability analysis and LLM-based unit test generation. VulEUT is designed to automatically verify the exploitability of vulnerabilities in third-party libraries commonly used in client software projects. VulEUT first analyzes the client projects to determine the reachability of vulnerability conditions. And then, it leverages the Large Language Model (LLM) to generate unit tests for vulnerability confirmation. To evaluate the effectiveness of VulEUT, we collect 32 vulnerabilities from various third-party libraries and conduct experiments on 70 real client projects. Besides, we also compare our approach with two representative tools, i.e., TRANSFER and VESTA. Our results demonstrate the effectiveness of VulEUT, with 229 out of 292 generated unit tests successfully confirming vulnerability exploitation across 70 client projects, which outperforms baselines by 24%.

Yi Gao

Zhejiang University

Xing Hu

Zhejiang University

China

Zirui Chen

Tongtong Xu

Nanjing University

China

Xiaohu Yang

Zhejiang University

China

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 27 Apr
Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30	Session2: FM for Software Quality Assurance & TestingResearch Papers / Data and Benchmarking at 207 Chair(s): Feifei Niu University of Ottawa

16:00 12m Long-paper		Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements Research Papers Seyed Moein Abtahi Ontario Tech University, Akramul Azim Ontario Tech University
16:12 12m Long-paper		Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models Research Papers Marc Bruni University of Applied Sciences and Arts Northwestern Switzerland, Fabio Gabrielli University of Applied Sciences and Arts Northwestern Switzerland, Mohammad Ghafari TU Clausthal, Martin Kropp University of Applied Sciences and Arts Northwestern Switzerland Pre-print
16:24 12m Long-paper		Vulnerability-Triggering Test Case Generation from Third-Party Libraries Research Papers Yi Gao Zhejiang University, Xing Hu Zhejiang University, Zirui Chen , Tongtong Xu Nanjing University, Xiaohu Yang Zhejiang University
16:36 6m Short-paper		Microservices Performance Testing with Causality-enhanced Large Language Models Research Papers Cristian Mascia University of Naples Federico II, Roberto Pietrantuono Università di Napoli Federico II, Antonio Guerriero Università di Napoli Federico II, Luca Giamattei Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II
16:42 6m Short-paper		MaRV: A Manually Validated Refactoring Dataset Data and Benchmarking Henrique Gomes Nunes Universidade Federal de Minas Gerais, Tushar Sharma Dalhousie University, Eduardo Figueiredo Federal University of Minas Gerais
16:48 6m Short-paper		PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection Data and Benchmarking Domenico Cotroneo University of Naples Federico II, Giuseppe De Rosa University of Naples Federico II, Pietro Liguori University of Naples Federico II
16:54 6m Short-paper		The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models Data and Benchmarking Jonathan Katzy Delft University of Technology, Răzvan Mihai Popescu Delft University of Technology, Arie van Deursen TU Delft, Maliheh Izadi Delft University of Technology
17:00 12m Long-paper		ELDetector: An Automated Approach Detecting Endless-loop in Mini Programs Research Papers Nan Hu Xi’an Jiaotong University, Ming Fan Xi'an Jiaotong University, Jingyi Lei Xi'an Jiaotong University, Jiaying He Xi'an Jiaotong University, Zhe Hou China Mobile System Integration Co.
17:12 12m Long-paper		Testing Android Third Party Libraries with LLMs to Detect Incompatible APIs Research Papers Tarek Mahmud Texas State University, Bin Duan University of Queensland, Meiru Che Central Queensland University, Anne Ngu Texas State University, Guowei Yang University of Queensland