FORGE 2025
Sun 27 - Mon 28 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025
Sun 27 Apr 2025 16:42 - 16:48 at 207 - Session2: FM for Software Quality Assurance & Testing Chair(s): Feifei Niu

Recent studies have assessed the use of state-of-the-art models (e.g., Foundation Models) for code refactoring, but the effectiveness of these pre-trained models remains limited. While model performance can be enhanced through datasets, the lack of high-quality data poses a significant challenge. This paper introduces the MaRV dataset, which contains 693 manually evaluated code pairs extracted from 126 GitHub Java repositories, representing four types of refactoring. Additionally, metadata describing the supposedly refactored elements was collected. Each code pair was manually evaluated by two reviewers from a pool of 40 participants. The MaRV dataset is continuously evolving, with a web-based tool available for evaluating refactoring representations. The primary application of this dataset is to improve the accuracy and reliability of state-of-the-art models in refactoring tasks, such as refactoring candidate identification and refactoring code generation, by providing high-quality data.

Sun 27 Apr

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
Session2: FM for Software Quality Assurance & TestingResearch Papers / Data and Benchmarking at 207
Chair(s): Feifei Niu University of Ottawa
16:00
12m
Long-paper
Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements
Research Papers
Seyed Moein Abtahi Ontario Tech University, Akramul Azim Ontario Tech University
16:12
12m
Long-paper
Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models
Research Papers
Marc Bruni University of Applied Sciences and Arts Northwestern Switzerland, Fabio Gabrielli University of Applied Sciences and Arts Northwestern Switzerland, Mohammad Ghafari TU Clausthal, Martin Kropp University of Applied Sciences and Arts Northwestern Switzerland
Pre-print
16:24
12m
Long-paper
Vulnerability-Triggering Test Case Generation from Third-Party Libraries
Research Papers
Yi Gao Zhejiang University, Xing Hu Zhejiang University, Zirui Chen , Tongtong Xu Nanjing University, Xiaohu Yang Zhejiang University
16:36
6m
Short-paper
Microservices Performance Testing with Causality-enhanced Large Language Models
Research Papers
Cristian Mascia University of Naples Federico II, Roberto Pietrantuono Università di Napoli Federico II, Antonio Guerriero Università di Napoli Federico II, Luca Giamattei Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II
16:42
6m
Short-paper
MaRV: A Manually Validated Refactoring Dataset
Data and Benchmarking
Henrique Gomes Nunes Universidade Federal de Minas Gerais, Tushar Sharma Dalhousie University, Eduardo Figueiredo Federal University of Minas Gerais
16:48
6m
Short-paper
PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection
Data and Benchmarking
Domenico Cotroneo University of Naples Federico II, Giuseppe De Rosa University of Naples Federico II, Pietro Liguori University of Naples Federico II
16:54
6m
Short-paper
The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models
Data and Benchmarking
Jonathan Katzy Delft University of Technology, Răzvan Mihai Popescu Delft University of Technology, Arie van Deursen TU Delft, Maliheh Izadi Delft University of Technology
17:00
12m
Long-paper
ELDetector: An Automated Approach Detecting Endless-loop in Mini Programs
Research Papers
Nan Hu Xi’an Jiaotong University, Ming Fan Xi'an Jiaotong University, Jingyi Lei Xi'an Jiaotong University, Jiaying He Xi'an Jiaotong University, Zhe Hou China Mobile System Integration Co.
17:12
12m
Long-paper
Testing Android Third Party Libraries with LLMs to Detect Incompatible APIs
Research Papers
Tarek Mahmud Texas State University, bin duan University of Queensland, Meiru Che Central Queensland University, Anne Ngu Texas State University, Guowei Yang University of Queensland