FORGE 2025
Sun 27 - Mon 28 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025

This program is tentative and subject to change.

Recent advancements in AI have sparked a trend in constructing large, generalist language models that handle a multitude of tasks, including many code-related ones. While these models are expensive to train and are often closed-source, they have enjoyed broad adoption because they tend to outperform smaller, domain-specific models of code. In this work, we argue that this is not a foregone conclusion. We show that modestly sized domain-specific models can outperform much larger ones on code labeling tasks, provided they are trained to the same standards. Concretely, we focus on StackOverflow (SO), which offers large volumes of aligned code and text data. We align established best-practices for pre-training large language models with properties of StackOverflow as a data source, especially using a large context window (2,048 tokens), coupled with a powerful toolkit (Megatron-LM) to train two models: SOBertBase, with 125M parameters, and SOBertLarge with 762M parameters, at a budget of just $374 and $1600 each. We compare the performance of our models with a prior domain-specific model which did not adopt many of these practices (BERTOverflow), as well two general-purpose BERT models (BERTBase and BERTLarge), and two models in OpenAI’s GPT series (GPT-3.5 and GPT-4). We study four labeling tasks: question quality prediction, closed question prediction, named entity recognition and obsoletion prediction. The final task is a new benchmark we introduce, on which we additionally compare SOBert with a fine-tuned CodeLlama and StackLlama (models with 10x more parameters than SOBertLarge). Our models, including the smaller one, consistently outperform all baselines. In contrast, BertOverflow is outperformed by generalist models in most tasks. These results demonstrate that pre-training both extensively and properly on in-domain data can yield a powerful and affordable alternative to leveraging closed-source general-purpose models. Both models are released to the public with over 500 downloads in the last month alone on Hugging Face.

This program is tentative and subject to change.

Mon 28 Apr

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
Session4: Human-AI Collaboration & Legal Aspects of using FMResearch Papers / Industry Papers at 207
11:00
12m
Long-paper
Extracting Fix Ingredients using Language Models
Research Papers
Julian Prenner Free University of Bozen-Bolzano, Romain Robbes Univ. Bordeaux, CRNS
11:12
12m
Long-paper
CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning
Research Papers
Cuong Chi Le FPT Software AI Center, Hoang Nhat Phan Nanyang Technological University, Huy Nhat Phan FPT Software AI Center, Tien N. Nguyen University of Texas at Dallas, Nghi D. Q. Bui Salesforce Research
11:24
12m
Long-paper
Addressing Specific and Complex Scenarios in Semantic Parsing
Research Papers
Yu Wang Nanjing University, Ming Fan Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University
11:36
12m
Long-paper
Skill over Scale: The Case for Medium, Domain-Specific Models for SE
Research Papers
Manisha Mukherjee Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University
Pre-print
11:48
12m
Long-paper
Resource-Efficient & Effective Code Summarization
Research Papers
Saima Afrin William & Mary, Joseph Call William & Mary, Khai Nguyen William & Mary, Oscar Chaparro William & Mary, Antonio Mastropaolo William and Mary, USA
12:00
6m
Short-paper
How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering
Research Papers
Christoph Treude Singapore Management University, Marco Gerosa Northern Arizona University
Pre-print
12:06
6m
Short-paper
"So what if I used GenAI?” - Legal Implications of Using GenAI in Software Engineering Research
Research Papers
Gouri Ginde (Deshpande) University of Calgary
12:12
6m
Short-paper
Evaluating the Ability of GPT-4o to Generate Verifiable Specifications in VeriFast
Research Papers
Marilyn Rego Purdue University, Wen Fan Purdue University, Xin Hu Univeristy of Michigan - Ann Arbor, Sanya Dod , Zhaorui Ni Purdue University, Danning Xie Purdue University, Jenna DiVincenzo (Wise) Purdue University, Lin Tan Purdue University
12:18
6m
Short-paper
Towards Generating App Feature Descriptions Automatically with LLMs: the Setapp Case Study
Industry Papers
:
:
:
: