A Reference Model for Empirically Comparing LLMs with Humans
SE for AI
Large Language Models (LLM) have shown stunning abilities to carry out tasks that were previously conducted by humans. The future role of humans and the responsibilities assigned to non-human LLMs affects society fundamentally. In that context, LLMs have often been compared to humans. However, it is surprisingly difficult to make a fair empirical comparison between humans and LLMs. To address those difficulties, we aim at establishing a systematic approach to guide researchers in comparing LLMs with humans across various linguistic and cognitive tasks. We developed a reference model of the information flow in an exploratory research study. Through a literature review, we examined key differences and similarities among several existing studies. We propose a framework to support researchers in designing and executing studies, and in assessing LLMs with respect to humans. Future studies can use the reference model as guidance for designing and reporting their own unique study design by mapping key decisions to the decision points of that reference model. We want to support researchers and the society to take a maturation step in this emerging and constantly growing field.
Thu 1 MayDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:30 | SE for AI 3Research Track / SE in Society (SEIS) / Journal-first Papers at 215 Chair(s): Lina Marsso École Polytechnique de Montréal | ||
14:00 15mTalk | Dissecting Global Search: A Simple yet Effective Method to Boost Individual Discrimination Testing and RepairSE for AI Research Track Lili Quan Tianjin University, Li Tianlin NTU, Xiaofei Xie Singapore Management University, Zhenpeng Chen Nanyang Technological University, Sen Chen Nankai University, Lingxiao Jiang Singapore Management University, Xiaohong Li Tianjin University Pre-print | ||
14:15 15mTalk | FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per ViolationSE for AI Research Track Yang Sun Singapore Management University, Chris Poskitt Singapore Management University, Kun Wang Zhejiang University, Jun Sun Singapore Management University Link to publication DOI Pre-print File Attached | ||
14:30 15mTalk | MARQ: Engineering Mission-Critical AI-based Software with Automated Result Quality AdaptationSE for AI Research Track Uwe Gropengießer Technical University of Darmstadt, Elias Dietz Technical University of Darmstadt, Florian Brandherm Technical University of Darmstadt, Achref Doula Technical University of Darmstadt, Osama Abboud Munich Research Center, Huawei, Xun Xiao Munich Research Center, Huawei, Max Mühlhäuser Technical University of Darmstadt | ||
14:45 15mTalk | An Empirical Study of Challenges in Machine Learning Asset ManagementSE for AI Journal-first Papers Zhimin Zhao Queen's University, Yihao Chen Queen's University, Abdul Ali Bangash Software Analysis and Intelligence Lab (SAIL), Queen's University, Canada, Bram Adams Queen's University, Ahmed E. Hassan Queen’s University | ||
15:00 15mTalk | A Reference Model for Empirically Comparing LLMs with HumansSE for AI SE in Society (SEIS) Kurt Schneider Leibniz Universität Hannover, Software Engineering Group, Farnaz Fotrousi Chalmers University of Technology and University of Gothenburg, Rebekka Wohlrab Chalmers University of Technology | ||
15:15 7mTalk | Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-PracticeSE for AI Journal-first Papers Bentley Oakes Polytechnique Montréal, Michalis Famelis Université de Montréal, Houari Sahraoui DIRO, Université de Montréal DOI Pre-print File Attached |