Impact of Request Formats on Effort Estimation: Are LLMs Different than Humans? (FSE 2025 - Research Papers)

Mon 23 - Fri 27 June 2025 Trondheim, Norway

Who

Gül Calikli, Mohammed Alhamed

Track

FSE 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 24 Jun 2025 17:20 - 17:40 at Cosmos 3C - MSR 2 Chair(s): DongGyun Han

Abstract

Expert judgment is the dominant strategy used for software development effort estimation. Yet, expert-based judgment can provide over-optimistic effort estimates, leading to projects’ poor budget planning and cost. and time overruns. Large Language Models (LLMs) are good candidates to assist software professionals in effort estimation. However, their effective leveraging for software development effort estimation requires thoroughly investigating their limitations and to what extent they overlap with those of (human) software practitioners. One primary limitation of LLMs is the sensitivity of their responses to prompt changes. Similarly, empirical studies showed that changes in the request format (e.g., rephrasing) could impact (human) software professionals’ effort estimates. In this paper, we replicated a series of experiments, which were initially conducted with (human) software professionals in the literature, to see how LLMs’ effort estimates change due to the transition from the traditional request format (i.e., ”How much effort is required to complete X?”) to the alternative request format (i.e., ”How much can be completed in Y work hours?”). Our experiments involved three different LLMs (GPT-4, Gemini 1.5 Pro, Llama 3.1) and 88 software project specifications (per treatment in each experiment), resulting in 880 prompts, in total that we prepared using 704 user stories from 3 open-source projects (Hyperledger Fabric, Mulesoft Mule, Spring XD). Our findings align with the original experiments conducted with software professionals: The first four experiments showed that LLMs provide lower effort estimates due to transitioning from the traditional to the alternative request format. The findings of the fifth and first experiments detected that LLMs display patterns analogous to anchoring bias, a human cognitive bias defined as the tendency to stick to the anchor (i.e., the ”Y work-hours” in the alternative request format).

DOI

https://doi.org/10.1145/3715771

Gül Calikli

University of Glasgow

United Kingdom

Mohammed Alhamed

Applied Behaviour Systems LTD (Hexis), United Kingdom

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 24 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:00 - 17:40	MSR 2Journal First / Ideas, Visions and Reflections / Research Papers / Demonstrations at Cosmos 3C Chair(s): DongGyun Han Royal Holloway, University of London

16:00 10m Talk		Introducing Repository Stability Ideas, Visions and Reflections Giuseppe Destefanis Brunel University of London, Silvia Bartolucci UCL, Daniel Graziotin University of Hohenheim, Rumyana Neykova Brunel University London, Marco Ortu University of Cagliari Pre-print
16:10 20m Talk		Scientific Open-Source Software Is Less Likely To Become Abandoned Than One Might Think! Lessons from Curating a Catalog of Maintained Scientific Software Research Papers Addi Malviya-Thakur The University of Tennessee, Knoxville / Oak Ridge National Laboratory, Reed Milewicz Sandia National Laboratories, Mahmoud Jahanshahi University of Tennessee, Lavinia Francesca Paganini Eindhoven University of Technology, Bogdan Vasilescu Carnegie Mellon University, Audris Mockus University of Tennessee Link to publication DOI
16:30 20m Talk		Who Will Stop Contributing to OSS Projects? Predicting Company Turnover Based on Initial Behavior Research Papers Mian Qin Beijing Institute of Technology, Yuxia Zhang Beijing Institute of Technology, Klaas-Jan Stol Lero; University College Cork; SINTEF Digital , Hui Liu Beijing Institute of Technology DOI
16:50 20m Talk		An empirical study of token-based micro commits Journal First Masanari Kondo Kyushu University, Daniel M. German University of Victoria, Yasutaka Kamei Kyushu University, Naoyasu Ubayashi Waseda University, Osamu Mizuno Kyoto Institute of Technology
17:10 10m Talk		TS-Detector : Detecting Feature Toggle Usage Patterns Demonstrations Md Tajmilur Rahman Gannon University, Mengzhe Fei University of Saskatchewan; Vendasta, Tushar Sharma Dalhousie University, Chanchal K. Roy University of Saskatchewan
17:20 20m Talk		Impact of Request Formats on Effort Estimation: Are LLMs Different than Humans? Research Papers Gül Calikli University of Glasgow, Mohammed Alhamed Applied Behaviour Systems LTD (Hexis), United Kingdom DOI

Information for Participants

Tue 24 Jun 2025 16:00 - 17:40 at Cosmos 3C - MSR 2 Chair(s): DongGyun Han

Info for room Cosmos 3C:

Cosmos 3C is the third room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.