TCSE logo 
 Sigsoft logo
Sustainability badge

This program is tentative and subject to change.

Thu 1 May 2025 11:45 - 12:00 at 215 - SE for AI 2

Software process models are essential to facilitate collaboration and communication among software teams to solve complex development tasks. Inspired by these software engineering practices, we present FlowGen – a code generation framework that emulates software process models based on multiple Large Language Model (LLM) agents. We emulate three process models, FlowGen${Waterfall}$, FlowGen${TDD}$, and FlowGen${Scrum}$, by assigning LLM agents to embody roles (i.e., requirement engineer, architect, developer, tester, and scrum master) that correspond to everyday development activities and organize their communication patterns. The agents work collaboratively using chain-of-thought and prompt composition with continuous self-refinement to improve the code quality. We use GPT-3.5 as our underlying LLM and several baselines (RawGPT, CodeT, Reflexion) to evaluate code generation on four benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET. Our findings show that FlowGen${Scrum}$ excels compared to other process models, achieving a Pass@1 of 75.2, 65.5, 82.5, and 56.7 in HumanEval, HumanEval-ET, MBPP, and MBPP-ET, respectively (an average of 15% improvement over RawGPT). Compared with other state-of-the-art techniques, FlowGen${Scrum}$ achieves a higher Pass@1 in MBPP compared to CodeT, with both outperforming Reflexion. Notably, integrating CodeT into FlowGen${Scrum}$ resulted in statistically significant improvements, achieving the highest Pass@1 scores. Our analysis also reveals that the development activities impacted code smell and exception handling differently, with design and code review adding more exception handling and reducing code smells. Finally, FlowGen models maintain stable Pass@1 scores across GPT-3.5 versions and temperature values, highlighting the effectiveness of software process models in enhancing the quality and stability of LLM-generated code.

This program is tentative and subject to change.

Thu 1 May

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
11:00
15m
Talk
Answering User Questions about Machine Learning Models through Standardized Model CardsSE for AI
Research Track
Tajkia Rahman Toma University of Alberta, Balreet Grewal University of Alberta, Cor-Paul Bezemer University of Alberta
11:15
15m
Talk
Fairness Testing through Extreme Value TheorySE for AI
Research Track
Verya Monjezi University of Texas at El Paso, Ashutosh Trivedi University of Colorado Boulder, Vladik Kreinovich University of Texas at El Paso, Saeid Tizpaz-Niari University of Illinois Chicago
11:30
15m
Talk
Fixing Large Language Models' Specification Misunderstanding for Better Code GenerationSE for AI
Research Track
Zhao Tian Tianjin University, Junjie Chen Tianjin University, Xiangyu Zhang Purdue University
Pre-print
11:45
15m
Talk
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model AgentsSE for AI
Research Track
Feng Lin Concordia University, Dong Jae Kim DePaul University, Tse-Hsun (Peter) Chen Concordia University
12:00
15m
Talk
The Product Beyond the Model -- An Empirical Study of Repositories of Open-Source ML ProductsSE for AI
Research Track
Nadia Nahar Carnegie Mellon University, Haoran Zhang Carnegie Mellon University, Grace Lewis Carnegie Mellon Software Engineering Institute, Shurui Zhou University of Toronto, Christian Kästner Carnegie Mellon University
12:15
15m
Talk
Towards Trustworthy LLMs for Code: A Data-Centric Synergistic Auditing FrameworkSE for AI
New Ideas and Emerging Results (NIER)
Chong Wang Nanyang Technological University, Zhenpeng Chen Nanyang Technological University, Li Tianlin NTU, Yilun Zhang AIXpert, Yang Liu Nanyang Technological University
:
:
:
: