TCSE logo 
 Sigsoft logo
Sustainability badge
Fri 2 May 2025 14:15 - 14:30 at 206 plus 208 - Human and Social 4 Chair(s): Liliana Pasquale

Recent neural models of code, such as OpenAI Codex and AlphaCode, have demonstrated remarkable proficiency at code generation due to the underlying attention mechanism. However, it often remains unclear how the models actually process code, and to what extent their reasoning and the way their attention mechanism scans the code matches the patterns of developers. A poor understanding of the model reasoning process limits the way in which current neural models are leveraged today, so far mostly for their raw prediction. To fill this gap, this work studies how the processed attention signal of three open large language models - CodeGen, InCoder and GPT-J - agrees with how developers look at and explore code when each answers the same sensemaking questions about code. Furthermore, we contribute an open-source eye-tracking dataset comprising 92 manually-labeled sessions from 25 developers engaged in sensemaking tasks. We empirically evaluate five heuristics that do not use the attention and ten attention-based post-processing approaches of the attention signal of CodeGen against our ground truth of developers exploring code, including the novel concept of follow-up attention which exhibits the highest agreement between model and human attention. Our follow-up attention method can predict the next line a developer will look at with 47% accuracy. This outperforms the baseline prediction accuracy of 42.3%, which uses the session history of other developers to recommend the next line. These results demonstrate the potential of leveraging the attention signal of pre-trained models for effective code exploration.

Fri 2 May

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
Human and Social 4Journal-first Papers / SE in Society (SEIS) / SE In Practice (SEIP) / Research Track at 206 plus 208
Chair(s): Liliana Pasquale University College Dublin & Lero
14:00
15m
Talk
Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products
SE In Practice (SEIP)
Nadia Nahar Carnegie Mellon University, Christian Kästner Carnegie Mellon University, Jenna L. Butler Microsoft Research, Chris Parnin Microsoft, Thomas Zimmermann University of California, Irvine, Christian Bird Microsoft Research
14:15
15m
Talk
Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration
Journal-first Papers
Matteo Paltenghi University of Stuttgart, Rahul Pandita GitHub, Inc., Austin Henley Carnegie Mellon University, Albert Ziegler XBow
14:30
15m
Talk
Do Developers Adopt Green Architectural Tactics for ML-Enabled Systems? A Mining Software Repository StudyArtifact-ReusableArtifact-AvailableArtifact-Functional
SE in Society (SEIS)
Vincenzo De Martino University of Salerno, Silverio Martínez-Fernández UPC-BarcelonaTech, Fabio Palomba University of Salerno
Pre-print
14:45
15m
Talk
Accessibility Issues in Ad-Driven Web ApplicationsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Abdul Haddi Amjad Virginia Tech, Muhammad Danish Virginia Tech, Bless Jah Virginia Tech, Muhammad Ali Gulzar Virginia Tech
15:00
15m
Talk
A Bot-based Approach to Manage Codes of Conduct in Open-Source Projects
SE in Society (SEIS)
Sergio Cobos IN3 - UOC, Javier Luis Cánovas Izquierdo Universitat Oberta de Catalunya
Pre-print
15:15
7m
Talk
Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding WeaknessesSecurity
Journal-first Papers
Wachiraphan (Ping) Charoenwet University of Melbourne, Patanamon Thongtanunam University of Melbourne, Thuan Pham University of Melbourne, Christoph Treude Singapore Management University
:
:
:
: