A Study on Developer Behaviors for Validating and Repairing LLM-Generated Code Using Eye Tracking and IDE Actions (VL/HCC 2024 - Research Papers)

Who

Ningzhi Tang, Meng Chen, Zheng Ning, Aakash Bansal, Yu Huang, Collin McMillan, Toby Jia-Jun Li

Track

VL/HCC 2024 Research Papers

Time Zone

The program is currently displayed in (GMT+01:00) London.

Use conference time zone: (GMT+01:00) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Sep 2024 12:10 - 12:25 at LT1 - Session 1: AI-Assisted Development Chair(s): Stefan Sauer

Abstract

The increasing use of large language model (LLM)-powered code generation tools, such as GitHub Copilot, is transforming software engineering practices. This paper investigates how developers validate and repair code generated by Copilot and examines the impact of code provenance awareness during these processes. We conducted a lab study with 28 participants tasked with validating and repairing Copilot-generated code in three software projects. Participants were randomly divided into two groups: one informed about the provenance of LLM-generated code and the other not. We collected data on IDE interactions, eye-tracking, cognitive workload assessments, and conducted semi-structured interviews. Our results indicate that, without explicit information, developers often fail to identify the LLM origin of the code. Developers exhibit LLM-specific behaviors such as frequent switching between code and comments, different attentional focus, and a tendency to delete and rewrite code. Being aware of the code’s provenance led to improved performance, increased search efforts, more frequent Copilot usage, and higher cognitive workload. These findings enhance our understanding of developer interactions with LLM-generated code and inform the design of tools for effective human-LLM collaboration in software development.

Link to Publication

https://ieeexplore.ieee.org/abstract/document/10714560

Link to Preprint

https://arxiv.org/abs/2405.16081

Ningzhi Tang

University of Notre Dame

United States

Meng Chen

Zheng Ning

University of Notre Dame

United States

Aakash Bansal

University of Notre Dame

United States

Yu Huang

Vanderbilt University

United States

Collin McMillan

University of Notre Dame

United States

Toby Jia-Jun Li

University of Notre Dame

United States

Time Zone

The program is currently displayed in (GMT+01:00) London.

Use conference time zone: (GMT+01:00) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Sep
Displayed time zone: London change

11:00 - 12:30	Session 1: AI-Assisted DevelopmentResearch Papers at LT1 Chair(s): Stefan Sauer Paderborn University

11:00 20m Talk		Let’s Fix this Together: Conversational Debugging with GitHub Copilot Research Papers Yasharth Bajpai Microsoft, Bhavya Chopra Microsoft, Param Biyani Microsoft, Cagri Aslan Microsoft, Dustin Coleman Microsoft, Sumit Gulwani Microsoft, Chris Parnin Microsoft, Arjun Radhakrishna Microsoft, Gustavo Soares Microsoft
11:20 20m Talk		BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks Research Papers Ruijia Cheng Apple, Titus Barik Apple, Alan Leung Apple, Fred Hohman Apple, Jeffrey Nichols Apple
11:40 15m Short-paper		Leveraging Visual Languages to Foster User Participation in Designing Trustworthy Machine Learning Systems: A Comparative Study Research Papers Serena Versino University of Pisa, Tommaso Turchi University of Pisa, Alessio Malizia Brunel University
11:55 15m Short-paper		Harnessing the Power of LLMs to Simplify Security: LLM Summarization for Human-Centric DAST Reports Research Papers Arpit Thool Virginia Tech, USA, Chris Brown Virginia Tech
12:10 15m Short-paper		A Study on Developer Behaviors for Validating and Repairing LLM-Generated Code Using Eye Tracking and IDE Actions Research Papers Ningzhi Tang University of Notre Dame, Meng Chen , Zheng Ning University of Notre Dame, Aakash Bansal University of Notre Dame, Yu Huang Vanderbilt University, Collin McMillan University of Notre Dame, Toby Jia-Jun Li University of Notre Dame Link to publication Pre-print