Enhancing Identifier Naming Through Multi-Mask Fine-tuning of Language Models of Code (SCAM 2024 - Research Track)

Who

Sanidhya Vijayvargiya, Mootez Saad, Tushar Sharma

Track

SCAM 2024 Research Track

Time Zone

The program is currently displayed in (GMT-07:00) Arizona.

Use conference time zone: (GMT-07:00) ArizonaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 7 Oct 2024 16:04 - 16:20 at Fremont - Maintenance Chair(s): Mohamed Wiem Mkaouer

Abstract

Code readability strongly influences code comprehension and, to some degree, code quality. Unreadable code makes software maintenance more challenging and is prone to more bugs. To improve the readability, using good identifier names is crucial. Existing studies on automatic identifier renaming have not considered aspects such as the code context. Additionally, prior research has done little to address the typical challenges inherent in the identifier renaming task. In this paper, we propose a new approach for renaming identifiers in source code by fine-tuning a transformer model. Through the use of perplexity as an evaluation metric, our results demonstrate a significant decrease in the perplexity values for the fine-tuned approach compared to the baseline, reducing them from 363 to 36. To further validate our method, we conduct a developers’ survey to gauge the suitability of the generated identifiers, comparing original identifiers with identifiers generated with our approach as well as two state-of-the-art large language models, GPT-4 Turbo and Gemini Pro. Our approach generates better identifier names than the original names and exhibits competitive performance with state-of-the-art commercial large language models. The proposed method carries significant implications for software developers, tool vendors, and researchers. Software developers may use our proposed approach to generate better variable names, increasing the clarity and readability of the software. Researchers in the field may use and build upon the proposed approach for variable renaming.

Link to Preprint

https://tusharma.in/preprints/SCAM2024_ReIdentify.pdf

Sanidhya Vijayvargiya

BITS Pilani Hyderabad Campus

India

Mootez Saad

Dalhousie University

Canada

Tushar Sharma