Optimizing Deep Learning Models to Address Class Imbalance in Code Comment Classification (NLBSE 2025) - ICSE 2025

Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Who

Moritz Mock, Thomas Borsani, Giuseppe Di Fatta, Barbara Russo

Track

NLBSE 2025 Natural Language Based SE

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Sun 27 Apr 2025 14:40 - 14:50 at 214 - Session 2 - Tool competition Chair(s): Maliheh Izadi, Sebastiano Panichella, Giuseppe Colavito, Pooja Rani, Ali Al-Kaswan, Nataliia Stulova

Abstract

Developers rely on code comments to document their work, track issues, and understand the source code. As such, comments provide valuable insights into developers’ understanding of their code and describe their various intentions in writing the surrounding code. Recent research leverages natural language processing and deep learning to classify comments based on developers’ intentions. However, such labelled data are often imbalanced, causing learning models to perform poorly. This work investigates the use of different weighting strategies of the loss function to mitigate the scarcity of certain classes in the dataset. In particular, various RoBERTa-based transformer models are fine-tuned by means of a hyperparameter search to identify their optimal parameter configurations. Additionally, we fine-tuned the transformers with different weighting strategies for the loss function to address class imbalances. Our approach outperforms the STACC baseline by 8.9 per cent on the NLBSE’25 Tool Competition dataset in terms of the average F1 score, and exceeding the baseline approach in 17 out of 19 cases with a gain ranging from -5.0 to 38.2. The source code is publicly available at https://github.com/moritzmock/NLBSE2025.

Link to Preprint

https://arxiv.org/abs/2501.15854

Moritz Mock

Free University of Bozen-Bolzano

Italy

Thomas Borsani

Giuseppe Di Fatta

Barbara Russo

Free University of Bozen/Bolzano, Italy

Italy

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Sun 27 Apr
Displayed time zone: Eastern Time (US & Canada) change

	14:00 - 15:30	Session 2 - Tool competitionNLBSE at 214 Chair(s): Maliheh Izadi Delft University of Technology, Sebastiano Panichella University of Bern, Giuseppe Colavito University of Bari, Pooja Rani University of Zurich, Ali Al-Kaswan Delft University of Technology, Netherlands, Nataliia Stulova MacPaw

	14:00 10m Other		Opening & Code Comment Classification Competition NLBSE Giuseppe Colavito University of Bari, Pooja Rani University of Zurich, Ali Al-Kaswan Delft University of Technology, Netherlands, Nataliia Stulova MacPaw
	14:10 10m Demonstration		Code Comment Classification with Data Augmentation and Transformer-Based Models NLBSE Mushfiqur Rahman , Mohammed Latif Siddiq University of Notre Dame
	14:20 10m Demonstration		GRAPHiC: Utilizing Graph Structures and Class Weights in Code Comment Classification with Pretrained BERT Models NLBSE Pir Sami Ullah Shah FAST National University, Shahela Saif , Muhammad Haris Athar FAST National University, Muhammad Riyaan Tariq National University of Computer and Emerging Sciences, Islamabad, Pakistan, Abdur Rehman Afzal FAST National University
	14:30 10m Demonstration		Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification NLBSE Fabian C. Peña University of Passau, Steffen Herbold University of Passau
	14:40 10m Demonstration		Optimizing Deep Learning Models to Address Class Imbalance in Code Comment Classification NLBSE Moritz Mock Free University of Bozen-Bolzano, Thomas Borsani , Giuseppe Di Fatta , Barbara Russo Free University of Bozen/Bolzano, Italy Pre-print
	14:50 10m Demonstration		CodeComClassify: Automating Code Comments Classification using BERT-based Language Models NLBSE Khubaib Amjad Alam , Wajid Ali , Nadeem Abbas Linnaeus University, Muhammad Haroon FAST National University, Summan Aziz , Meer Hashaam Khan FAST National University, Zahoor Ahmad
	15:00 10m Other		Competition Closing NLBSE Giuseppe Colavito University of Bari, Pooja Rani University of Zurich, Ali Al-Kaswan Delft University of Technology, Netherlands, Nataliia Stulova MacPaw
	15:10 10m Day closing		Workshop closing NLBSE Maliheh Izadi Delft University of Technology, Sebastiano Panichella University of Bern