Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision of input tests, and often require extensive program analysis, program instrumentation, or data preprocessing. Prior work on deep learning for APR struggles to learn from small datasets and produces limited results on real-world programs. Inspired by the ability of large language models (LLMs) of code to adapt to new tasks based on very few examples, we investigate the applicability of LLMs to line level fault localization. Specifically, we propose to overcome the left-to-right nature of LLMs by fine-tuning a small set of bidirectional adapter layers on top of the representations learned by LLMs to produce LLMFL, the first language model based fault localization approach that locates buggy lines of code without any test coverage information. We fine-tune LLMs with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs such as the Defects4J corpus. We observe that our technique achieves substantially more confidence in fault localization when built on the larger models, with bug localization performance scaling consistently with the LLM size. Our empirical evaluation shows that LLMFL improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%. LLMFL is also the first FL technique trained using a language model architecture that can detect security vulnerabilities down to the code line level.
Wed 17 AprDisplayed time zone: Lisbon change
16:00 - 17:30 | LLM, NN and other AI technologies 2Journal-first Papers / Software Engineering in Practice / New Ideas and Emerging Results / Research Track / Software Engineering in Society at Luis de Freitas Branco Chair(s): Jane Cleland-Huang University of Notre Dame | ||
16:00 15mTalk | Large Language Models for Test-Free Fault Localization Research Track Aidan Z.H. Yang Carnegie Mellon University, Claire Le Goues Carnegie Mellon University, Ruben Martins Carnegie Mellon University, Vincent J. Hellendoorn Carnegie Mellon University | ||
16:15 15mTalk | Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection Research Track Benjamin Steenhoek Iowa State University, Hongyang Gao Dept. of Computer Science, Iowa State University, Wei Le Iowa State University Pre-print | ||
16:30 15mTalk | An Empirical Study on Compliance with Ranking Transparency in the Software Documentation of EU Online Platforms Software Engineering in Society Francesco Sovrano University of Zurich, Michaël Lognoul University of Namur (CRIDS, NADI), Alberto Bacchelli University of Zurich | ||
16:45 15mTalk | An Industry Case Study on Adoption of AI-based Programming Assistants Software Engineering in Practice Nicole Davila Universidade Federal do Rio Grande do Sul, Igor Wiese Federal University of Technology, Igor Steinmacher Northern Arizona University, Lucas Lucio Federal University of Technology - Paraná (UTFPR), André Kawamoto Federal University of Technology - Paraná (UTFPR), Gilson José Peres Favaro , Ingrid Nunes Universidade Federal do Rio Grande do Sul (UFRGS), Brazil | ||
17:00 7mTalk | Assessing LLMs for High Stakes Applications Software Engineering in Practice Shannon K. Gallagher Software Engineering Institute, Carnegie Mellon University, Jasmine Ratchford Software Engineering Institute, Carnegie Mellon University, Tyler Brooks Software Engineering Institute, Carnegie Mellon University, Bryan P. Brown Software Engineering Institute, Carnegie Mellon University, Eric Heim Software Engineering Institute, Carnegie Mellon University, William R. Nichols Software Engineering Institute, Carnegie Mellon University, Scott McMillan Software Engineering Institute, Carnegie Mellon University, Swati Rallapalli Software Engineering Institute, Carnegie Mellon University, Carol J. Smith Software Engineering Institute, Carnegie Mellon University, Nathan VanHoudnos Software Engineering Institute, Carnegie Mellon University, Nick Winski Software Engineering Institute, Carnegie Mellon University, Andrew O. Mellinger Software Engineering Institute, Carnegie Mellon University | ||
17:07 7mTalk | ITG: Trace Generation via Iterative Interaction between LLM Query and Trace Checking New Ideas and Emerging Results Weilin Luo SUN YAT-SEN UNIVERSITY, Weiyuan Fang SUN YAT-SEN UNIVERSITY, Junming Qiu SUN YAT-SEN UNIVERSITY, Hai Wan School of Data and Computer Science, Sun Yat-sen University, Yanan Liu SUN YAT-SEN UNIVERSITY, Rongzhen Ye Sun Yat-Sen University | ||
17:14 7mTalk | Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks Journal-first Papers NIKITA MEHROTRA Indraprastha Institute of Information Technology, Akash Sharma IIIT-Delhi, Anmol Jindal IIIT-Delhi, Rahul Purandare UNL, USA |