FORGE 2025 - Data and Benchmarking

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

You're viewing the program in a time zone which is different from your device's time zone change time zone

Sun 27 Apr
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Session1: FM for Code Generation Research Papers / Data and Benchmarking at 207 Chair(s): Lili Wei McGill University

14:00 12m Long-paper		RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion Research Papers Huy Nhat Phan FPT Software AI Center, Hoang Nhat Phan Nanyang Technological University, Tien N. Nguyen University of Texas at Dallas, Nghi D. Q. Bui Salesforce Research
14:12 12m Long-paper		SoTaNa: An Open-Source Software Engineering Instruction-Tuned Model Research Papers Ensheng Shi Xi’an Jiaotong University, Yanlin Wang Sun Yat-sen University, Fengji Zhang Microsoft Research Asia, Bei Chen Microsoft Research Asia, Hongyu Zhang Chongqing University, Yanli Wang Sun Yat-sen University, Daya Guo Sun Yat-sen University, Lun Du Microsoft Research, Shi Han Microsoft Research, Dongmei Zhang Microsoft Research, Hongbin Sun Xi’an Jiaotong University
14:24 12m Long-paper		Automated Codebase Reconciliation using Large Language Models Research Papers Aneri Gandhi University of Toronto, Sanjukta De Advanced Micro Devices, Marsha Chechik University of Toronto, Vinay Pandit Advanced Micro Devices, Max Kiehn Advanced Micro Devices, Matthieu Chan Chee Advanced Micro Devices, Yonas Bedasso Advanced Micro Devices
14:36 12m Long-paper		AI-Powered, But Power-Hungry? Energy Efficiency of LLM-Generated Code Research Papers Lola Solovyeva University of Twente, Sophie Weidmann University of Twente, Fernando Castor University of Twente
14:48 6m Short-paper		SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation Data and Benchmarking Ivan Petrukha MacPaw, Yana Kurliak MacPaw, Nataliia Stulova MacPaw
14:54 6m Short-paper		SE Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering Research Papers Zhimin Zhao Queen's University
15:00 12m Long-paper		PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback Research Papers Yun Peng The Chinese University of Hong Kong, Akhilesh Deepak Gotmare Salesforce Research, Michael Lyu The Chinese University of Hong Kong, Caiming Xiong Salesforce Research, Silvio Savarese Salesforce Research, Doyen Sahoo Salesforce Research
15:12 6m Short-paper		HyRACC: A Hybrid Retrieval-Augmented Framework for More Efficient Code Completion Research Papers Chuanyi Li Nanjing University, Jiwei Shang Nanjing University, Yi Feng Nanjing University, Bin Luo Nanjing University
15:18 6m Short-paper		OptCodeTrans: Boost LLMs on Low-Resource Programming Language Translation Research Papers Jianbo Lin Nanjing University, Yi Shen Nanjing University, Chuanyi Li Nanjing University, Changan Niu Software Institute, Nanjing University, Bin Luo Nanjing University

16:00 - 17:30	Session2: FM for Software Quality Assurance & TestingResearch Papers / Data and Benchmarking at 207 Chair(s): Feifei Niu University of Ottawa

16:00 12m Long-paper		Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements Research Papers Seyed Moein Abtahi Ontario Tech University, Akramul Azim Ontario Tech University
16:12 12m Long-paper		Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models Research Papers Marc Bruni University of Applied Sciences and Arts Northwestern Switzerland, Fabio Gabrielli University of Applied Sciences and Arts Northwestern Switzerland, Mohammad Ghafari TU Clausthal, Martin Kropp University of Applied Sciences and Arts Northwestern Switzerland Pre-print
16:24 12m Long-paper		Vulnerability-Triggering Test Case Generation from Third-Party Libraries Research Papers Yi Gao Zhejiang University, Xing Hu Zhejiang University, Zirui Chen , Tongtong Xu Nanjing University, Xiaohu Yang Zhejiang University
16:36 6m Short-paper		Microservices Performance Testing with Causality-enhanced Large Language Models Research Papers Cristian Mascia University of Naples Federico II, Roberto Pietrantuono Università di Napoli Federico II, Antonio Guerriero Università di Napoli Federico II, Luca Giamattei Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II
16:42 6m Short-paper		MaRV: A Manually Validated Refactoring Dataset Data and Benchmarking Henrique Gomes Nunes Universidade Federal de Minas Gerais, Tushar Sharma Dalhousie University, Eduardo Figueiredo Federal University of Minas Gerais
16:48 6m Short-paper		PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection Data and Benchmarking Domenico Cotroneo University of Naples Federico II, Giuseppe De Rosa University of Naples Federico II, Pietro Liguori University of Naples Federico II
16:54 6m Short-paper		The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models Data and Benchmarking Jonathan Katzy Delft University of Technology, Răzvan Mihai Popescu Delft University of Technology, Arie van Deursen TU Delft, Maliheh Izadi Delft University of Technology
17:00 12m Long-paper		ELDetector: An Automated Approach Detecting Endless-loop in Mini Programs Research Papers Nan Hu Xi’an Jiaotong University, Ming Fan Xi'an Jiaotong University, Jingyi Lei Xi'an Jiaotong University, Jiaying He Xi'an Jiaotong University, Zhe Hou China Mobile System Integration Co.
17:12 12m Long-paper		Testing Android Third Party Libraries with LLMs to Detect Incompatible APIs Research Papers Tarek Mahmud Texas State University, Bin Duan University of Queensland, Meiru Che Central Queensland University, Anne Ngu Texas State University, Guowei Yang University of Queensland

Accepted Papers

	Title
	MaRV: A Manually Validated Refactoring Dataset Data and Benchmarking Henrique Gomes Nunes, Tushar Sharma, Eduardo Figueiredo
	PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection Data and Benchmarking Domenico Cotroneo, Giuseppe De Rosa, Pietro Liguori
	SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation Data and Benchmarking Ivan Petrukha, Yana Kurliak, Nataliia Stulova
	The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models Data and Benchmarking Jonathan Katzy, Răzvan Mihai Popescu, Arie van Deursen, Maliheh Izadi

Call for Papers

High-quality datasets and a robust evaluation framework are essential for the development and evaluation of foundation models (FM). The Benchmarking track serves as a forum for publishing high-quality research on machine learning datasets and benchmarking results that extend beyond traditional evaluation metrics. Particularly, this track encourages publications that advance the frontiers of data quality and benchmarking standards that facilitate the development and assessment of FM for software engineering (SE).

Scope

This track will accept two types of submissions: (1) data papers, (2) benchmarking papers, in the context of software engineering.

1.Data papers are expected to include:

New datasets, or carefully and thoughtfully designed (collections of) datasets based on previously available data.
Data generators and reinforcement learning environments.
Data-centric AI methods and tools, e.g. to measure and improve data quality or utility, or studies in data-centric AI that bring important new insights.
Advanced practices in data collection and curation are of general interest even if the data itself cannot be shared.
Frameworks for responsible dataset development, audits of existing datasets, and identifying significant problems with existing datasets and their use.

2.Benchmarking papers are expected to include:

Benchmarks on new or existing metrics, as well as benchmarking tools.
Systematic analyses of existing systems on novel datasets yield important new insights.

Criteria

We are aiming for an evaluation specifically suited to data and benchmarking.

1.For Data papers:

value, usefulness, and reusability of the datasets or tools;
quality of the presentation;
clarity of relation with related work and its relevance to software engineering;
accessibility of the datasets or tools, i.e., the data can be found and obtained without a personal request, and any required code should be open source.

2.For Benchmarking papers:

the relevance of the proposed demonstration for the FORGE audience;
the originality of its underlying ideas;
the quality of the presentation;
the usefulness of the results.
the outreach of the proposed tool, metric or dataset

Submission Instructions

Regardless of paper types mentioned in the Scope section, the length of all the papers submitted to this track is restricted to a maximum of 4 pages, plus 1 additional page of references.

We encourage all authors to disclose (anonymized and curated) data/artifacts to increase reproducibility and replicability. Note that sharing research artifacts is not mandatory for submission or acceptance. However, sharing is expected to be the default, and non-sharing needs to be justified.

All submissions must be in PDF. The page limit is strict, and it will not be possible to purchase additional pages at any point in the process (including after acceptance).

Submissions must conform to the IEEE conference proceedings template, specified in the IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf options).

Note, we use double-anonymous reviewing. Be sure to remove the list of authors from the submitted paper. If citing your own prior work, please do so in the third person to obscure the relationship you have with it. For advice, guidance, and explanation about the double-anonymous review process, see ICSE Research Track’s Q&A page.

All papers must be written in English. The authors are strongly encouraged to use the HotCRP format checker on their submissions. Note that the format checker is not perfect. In particular, it can complain about small fonts in figures, footnotes, or references. As long as the main text follows the requested format, and the figures are readable, the paper will not be rejected for format violations. If you have any concerns, please contact the program chairs.

All papers should be made accessible to people with disabilities. Some guidelines from the SIGACCESS community are available here: https://assets21.sigaccess.org/creating_accessible_pdfs.html.

Please submit your paper on HotCRP: https://forge25-benchmarking.hotcrp.com/

Important dates (AOE)

Full paper submission deadline: Dec 13, 2024
Author notification: Jan 14, 2025
Camera-ready deadline: Feb 21, 2025

09:00 - 10:30	FORGE2025 Opening / KeynoteKeynotes at 207 Chair(s): David Lo Singapore Management University, Denys Poshyvanyk William & Mary

09:00 10m Day opening		Introduction from The Chairs Keynotes Xin Xia Huawei, David Lo Singapore Management University, Cuiyun Gao Harbin Institute of Technology, Denys Poshyvanyk William & Mary
09:10 60m Keynote		Keynote: LLMs (for code) are often wrong. What to do? Keynotes Prem Devanbu University of California at Davis

11:00 - 12:30	FORGE2025 Panel / KeynoteKeynotes / at 207 Chair(s): Denys Poshyvanyk William & Mary

11:00 60m Keynote		Keynote: Trust No Bot? Forging Confidence in AI for Software Engineering Keynotes Thomas Zimmermann University of California, Irvine
12:00 30m Panel		Panel Discussion Panel

09:00 - 10:30	FORGE2025 Keynote & Session3: Collaborative Software DevelopmentResearch Papers / Keynotes at 207 Chair(s): Xin Xia Huawei, Yuan Tian Queen's University, Kingston, Ontario

09:00 60m Keynote		Keynote: Large language models for agentic software engineering Keynotes Graham Neubig Carnegie Mellon University
10:00 12m Long-paper		AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology Research Papers Minh Nguyen Huynh FPT Software AI Center, Thang Phan Chau FPT Software AI Center, Phong X. Nguyen FPT Software AI Center, Nghi D. Q. Bui Salesforce Research
10:12 12m Long-paper		Enhancing Pull Request Reviews: Leveraging Large Language Models to Detect Inconsistencies Between Issues and Pull Requests Research Papers Ali Tunahan Işık Bilkent University, Hatice Kübra Çağlar Bilkent University, Eray Tüzün Bilkent University

11:00 - 12:30	Session4: Human-AI Collaboration & Legal Aspects of using FMResearch Papers / Industry Papers at 207 Chair(s): Zhenhao Li York University

11:00 12m Long-paper		Extracting Fix Ingredients using Language Models Research Papers Julian Prenner Free University of Bozen-Bolzano, Romain Robbes CNRS, LaBRI, University of Bordeaux
11:12 12m Long-paper		CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning Research Papers Cuong Chi Le FPT Software AI Center, Hoang Nhat Phan Nanyang Technological University, Huy Nhat Phan FPT Software AI Center, Tien N. Nguyen University of Texas at Dallas, Nghi D. Q. Bui Salesforce Research
11:24 12m Long-paper		Addressing Specific and Complex Scenarios in Semantic Parsing Research Papers Yu Wang Xi'an Jiaotong University, Ming Fan Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University
11:36 12m Long-paper		Skill over Scale: The Case for Medium, Domain-Specific Models for SE Research Papers Manisha Mukherjee Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University Pre-print
11:48 12m Long-paper		Resource-Efficient & Effective Code Summarization Research Papers Saima Afrin William & Mary, Joseph Call William & Mary, Khai Nguyen William & Mary, Oscar Chaparro William & Mary, Antonio Mastropaolo William and Mary, USA
12:00 6m Short-paper		How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering Research Papers Christoph Treude Singapore Management University, Marco Gerosa Northern Arizona University Pre-print
12:06 6m Short-paper		"So what if I used GenAI?” - Legal Implications of Using GenAI in Software Engineering Research Research Papers Gouri Ginde (Deshpande) University of Calgary Pre-print
12:12 6m Short-paper		Evaluating the Ability of GPT-4o to Generate Verifiable Specifications in VeriFast Research Papers Marilyn Rego Purdue University, Wen Fan Purdue University, Xin Hu Univeristy of Michigan - Ann Arbor, Sanya Dod , Zhaorui Ni Purdue University, Danning Xie Purdue University, Jenna DiVincenzo (Wise) Purdue University, Lin Tan Purdue University
12:18 6m Short-paper		Towards Generating App Feature Descriptions Automatically with LLMs: the Setapp Case Study Industry Papers Yevhenii Peteliev MacPaw, Ivan Synytsia MacPaw, Nataliia Stulova MacPaw

14:00 - 15:30	FORGE2025 KeynoteKeynotes at 207 Chair(s): Michele Tufano Google

14:00 45m Keynote		Industry Keynote: One shall not live on LLM alone Keynotes Darya Rovdo JetBrains
14:45 45m Keynote		Industry Keynote: AI in Software Engineering at Google Keynotes Satish Chandra Google, Inc

	13:30 - 14:00	FORGE2025 PosterResearch Papers at Canada Hall 3 Poster Area

16:00 - 17:30	FORGE2025 Tutorial & Session5: FM EvaluationKeynotes / Tutorials / Research Papers at 207 Chair(s): Xin Xia Huawei

16:00 12m Long-paper		Cyber-Attack Detection and Localization for SCADA system of CPSs Research Papers Dan Li Sun Yat-sen University, Junnan Tang Sun Yat-Sen University, Shunyu Wu Sun Yat-Sen University, Zibin Zheng Sun Yat-sen University, See-Kiong Ng National University of Singapore
16:12 12m Long-paper		A Comprehensive Study of Bug Characteristics on Foundation Language Models Research Papers Junxiao Han Hangzhou City University, Guanqi Wang Zhejiang University, Jiakun Liu Singapore Management University, Lingfeng Bao Zhejiang University, Xing Hu Zhejiang University, Jinling Wei Hangzhou City University, Shuiguang Deng Zhejiang University; Alibaba-Zhejiang University Joint Institute of Frontier Technologies
16:24 12m Long-paper		Testing Refactoring Engine via Historical Bug Report driven LLM Research Papers Haibo Wang Concordia University, Zhuolin Xu Concordia University, Shin Hwei Tan Concordia University Pre-print
16:36 45m Tutorial		Beyond Code Generation: Evaluating and Improving LLMs for Code Intelligence Tutorials Fatemeh Hendijani Fard Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus
17:21 9m Keynote		Industry Keynote: Enhancing Software Engineering with Large Language Models: Insights, Challenges, and Future Directions Keynotes Dong Qiu Waterloo Research Center, Huawei Canada

17:30 - 18:00	ClosingResearch Papers at 207 Chair(s): David Lo Singapore Management University

17:30 30m Day closing		Closing session of FORGE 2025 Research Papers

Data and BenchmarkingFORGE 2025

Program Display Configuration

Sun 27 AprDisplayed time zone: Eastern Time (US & Canada) change

Mon 28 AprDisplayed time zone: Eastern Time (US & Canada) change

Accepted Papers

Call for Papers

Scope

Criteria

Submission Instructions

Important dates (AOE)

Antonio MastropaoloBenchmarking Co-Chair

William and Mary, USA

United States

Bowen XuBenchmarking Co-Chair

North Carolina State University

United States

Kevin Moran

University of Central Florida

United States

Bin Lin

Hangzhou Dianzi University

China

Rosalia Tufano

Università della Svizzera Italiana

Switzerland

Andrea Stocco

Technical University of Munich, fortiss

Germany

Shaukat Ali

Simula Research Laboratory and Oslo Metropolitan University

Norway

Alberto Martin-Lopez

Software Institute - USI, Lugano

Switzerland

Ze Shi (Zane) Li

University of Victoria, Canada

Canada

Nan Jiang

Purdue University

United States

Jialun Cao

Hong Kong University of Science and Technology

China

Yanjie Zhao

Huazhong University of Science and Technology

China

Zhipeng Gao

Shanghai Institute for Advanced Study - Zhejiang University

China

Jinfu Chen

Wuhan University

China

Jieshan Chen

CSIRO's Data61

Australia

Sun 27 Apr
Displayed time zone: Eastern Time (US & Canada) change

Mon 28 Apr
Displayed time zone: Eastern Time (US & Canada) change