Beyond Accuracy and Robustness Metrics for Large Language Models for Code (ICSE 2024 - Doctoral Symposium)

Track

ICSE 2024 Doctoral Symposium

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 16 Apr 2024 14:00 - 15:30 at Fernando Pessoa - Focus Group: AI/ML for SE Chair(s): Reyhaneh Jabbarvand

Abstract

In recent years, Large Language Models for code (LLMc) have transformed the landscape of software engineering (SE), demonstrating significant efficacy in tasks such as code completion, summarization, review, tracing, translation, test case generation, clone detection, and bug fixing. Notably, GitHub Copilot and Google’s CodeBot exemplify how LLMc contributes to substantial time and effort savings in software development. However, despite their widespread use, there is a growing need to thoroughly assess LLMc, as current evaluation processes heavily rely on accuracy and robustness metrics, lacking consensus on additional influential factors in code generation. This gap hinders a holistic understanding of LLMc performance, impacting interpretability, efficiency, bias, fairness, and robustness. The challenges in benchmarking and data maintenance compound this issue, underscoring the necessity for a comprehensive evaluation approach. To address these issues, this dissertation proposes the development of a benchmarking infrastructure, named \approach, aimed at overcoming gaps in evaluating LLMc quality. The goal is to standardize testing scenarios, facilitate meaningful comparisons across LLMc, and provide multi-metric measurements beyond a sole focus on accuracy. This approach aims to decrease the costs associated with advancing LLMc research, enhancing their reliability for adoption in academia and industry.

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 16 Apr
Displayed time zone: Lisbon change

14:00 - 15:30	Focus Group: AI/ML for SEDoctoral Symposium at Fernando Pessoa Chair(s): Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign

14:00 90m Poster		Beyond Accuracy: Evaluating Source Code Capabilities in Large Language Models for Software Engineering Doctoral Symposium Alejandro Velasco William & Mary
14:00 90m Poster		Towards Interpreting the Behavior of Large Language Models on Software Engineering Tasks Doctoral Symposium Atish Kumar Dipongkor University of Central Florida
14:00 90m Poster		Programming Language Models in Multilingual Settings Doctoral Symposium Jonathan Katzy Delft University of Technology
14:00 90m Poster		Beyond Accuracy and Robustness Metrics for Large Language Models for Code Doctoral Symposium Daniel Rodriguez-Cardenas
14:00 90m Poster		Towards Safe, Secure, and Usable LLMs4Code Doctoral Symposium Ali Al-Kaswan Delft University of Technology, Netherlands

Beyond Accuracy and Robustness Metrics for Large Language Models for Code

Tue 16 Apr
Displayed time zone: Lisbon change

Daniel Rodriguez-Cardenas

Tracks

Co-hosted Conferences

Workshops

Beyond Accuracy and Robustness Metrics for Large Language Models for Code

Program Display Configuration

Program Display Configuration

Tue 16 AprDisplayed time zone: Lisbon change

Daniel Rodriguez-Cardenas

Tue 16 Apr
Displayed time zone: Lisbon change