Beyond Accuracy: Evaluating Source Code Capabilities in Large Language Models for Software Engineering
This dissertation aims to introduce interpretability techniques to comprehensively evaluate the performance of Large Language Models (LLMs) in software engineering tasks, beyond canonical metrics. In software engineering, Deep Learning techniques are widely employed across various domains, automating tasks such as code comprehension, bug fixing, code summarization, machine translation, and code generation. However, the prevalent use of accuracy-based metrics for evaluating Language Models trained on code often leads to an overestimation of their performance. Our work seeks to propose novel and comprehensive interpretability techniques to evaluate source code capabilities and provide a more nuanced understanding of LLMs performance across downstream tasks.
Tue 16 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Focus Group: AI/ML for SEDoctoral Symposium at Fernando Pessoa Chair(s): Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign | ||
14:00 90mPoster | Beyond Accuracy: Evaluating Source Code Capabilities in Large Language Models for Software Engineering Doctoral Symposium Alejandro Velasco William & Mary | ||
14:00 90mPoster | Towards Interpreting the Behavior of Large Language Models on Software Engineering Tasks Doctoral Symposium Atish Kumar Dipongkor University of Central Florida | ||
14:00 90mPoster | Programming Language Models in Multilingual Settings Doctoral Symposium Jonathan Katzy Delft University of Technology | ||
14:00 90mPoster | Beyond Accuracy and Robustness Metrics for Large Language Models for Code Doctoral Symposium | ||
14:00 90mPoster | Towards Safe, Secure, and Usable LLMs4Code Doctoral Symposium Ali Al-Kaswan Delft University of Technology, Netherlands |