A Reference Model for Empirically Comparing LLMs with Humans (ICSE 2025 - Software Engineering in Society (SEIS)) - ICSE 2025

Sustainability badge

Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Who

Kurt Schneider, Farnaz Fotrousi, Rebekka Wohlrab

Track

ICSE 2025 SE in Society (SEIS)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Thu 1 May 2025 15:00 - 15:15 at 215 - SE for AI 3 Chair(s): Lina Marsso

Abstract

Large Language Models (LLM) have shown stunning abilities to carry out tasks that were previously conducted by humans. The future role of humans and the responsibilities assigned to non-human LLMs affects society fundamentally. In that context, LLMs have often been compared to humans. However, it is surprisingly difficult to make a fair empirical comparison between humans and LLMs. To address those difficulties, we aim at establishing a systematic approach to guide researchers in comparing LLMs with humans across various linguistic and cognitive tasks. We developed a reference model of the information flow in an exploratory research study. Through a literature review, we examined key differences and similarities among several existing studies. We propose a framework to support researchers in designing and executing studies, and in assessing LLMs with respect to humans. Future studies can use the reference model as guidance for designing and reporting their own unique study design by mapping key decisions to the decision points of that reference model. We want to support researchers and the society to take a maturation step in this emerging and constantly growing field.

Kurt Schneider

Leibniz Universität Hannover, Software Engineering Group

Germany

Farnaz Fotrousi

Chalmers University of Technology and University of Gothenburg

Sweden

Rebekka Wohlrab

Chalmers University of Technology

Sweden

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Thu 1 May
Displayed time zone: Eastern Time (US & Canada) change

	14:00 - 15:30	SE for AI 3Research Track / SE in Society (SEIS) / Journal-first Papers at 215 Chair(s): Lina Marsso École Polytechnique de Montréal

	14:00 15m Talk		Dissecting Global Search: A Simple yet Effective Method to Boost Individual Discrimination Testing and RepairSE for AI Research Track Lili Quan Tianjin University, Li Tianlin NTU, Xiaofei Xie Singapore Management University, Zhenpeng Chen Nanyang Technological University, Sen Chen Nankai University, Lingxiao Jiang Singapore Management University, Xiaohong Li Tianjin University Pre-print
	14:15 15m Talk		FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per ViolationSE for AI Research Track Yang Sun Singapore Management University, Chris Poskitt Singapore Management University, Kun Wang Zhejiang University, Jun Sun Singapore Management University Link to publication DOI Pre-print File Attached
	14:30 15m Talk		MARQ: Engineering Mission-Critical AI-based Software with Automated Result Quality AdaptationSE for AI Research Track Uwe Gropengießer Technical University of Darmstadt, Elias Dietz Technical University of Darmstadt, Florian Brandherm Technical University of Darmstadt, Achref Doula Technical University of Darmstadt, Osama Abboud Munich Research Center, Huawei, Xun Xiao Munich Research Center, Huawei, Max Mühlhäuser Technical University of Darmstadt
	14:45 15m Talk		An Empirical Study of Challenges in Machine Learning Asset ManagementSE for AI Journal-first Papers Zhimin Zhao Queen's University, Yihao Chen Queen's University, Abdul Ali Bangash Software Analysis and Intelligence Lab (SAIL), Queen's University, Canada, Bram Adams Queen's University, Ahmed E. Hassan Queen’s University
	15:00 15m Talk		A Reference Model for Empirically Comparing LLMs with HumansSE for AI SE in Society (SEIS) Kurt Schneider Leibniz Universität Hannover, Software Engineering Group, Farnaz Fotrousi Chalmers University of Technology and University of Gothenburg, Rebekka Wohlrab Chalmers University of Technology
	15:15 7m Talk		Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-PracticeSE for AI Journal-first Papers Bentley Oakes Polytechnique Montréal, Michalis Famelis Université de Montréal, Houari Sahraoui DIRO, Université de Montréal DOI Pre-print File Attached

:

:

:

: