Enhancing Python Code Maintainability through Large Language Model-Based Approaches (PROFES 2025 - Research Papers) - PROFES 2025

Mon 1 - Wed 3 December 2025 Salerno , Italy

Who

Karthik Shivashankar, Antonio Martini

Track

PROFES 2025 Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Tue 2 Dec 2025 09:30 - 09:45 at Room 1 - Technical Debt and Refactoring

Abstract

While Large Language Models (LLMs) increasingly assist in code generation, concerns persist regarding the main- tainability of the code they produce—an aspect often overshad- owed by functional correctness. Overlooking maintainability can contribute to technical debt and inflate long-term software costs. This research investigates whether targeted fine-tuning can enhance an LLM’s ability to generate more maintainable Python code. We developed a approach involving the curation of cus- tom datasets (from CommitPackFT and Code Alpaca Python subsets) specifically annotated for maintainability using met- rics like Source Lines of Code (SLOC), Halstead Effort, and Maintainability Index (MI). A weak-to-strong generalization strategy was employed, using a smaller model (Phi 4 14B) to generate maintainability-focused examples for fine-tuning a larger model (QwenCoder2.5 32B Instruct) with parameter- efficient techniques. Evaluations revealed the fine-tuned model significantly reduced code complexity (Halstead Effort) and length (SLOC) compared to the original code samples. While the model preserved high functional similarity (verified by CodeBERTScore), results for the Maintainability Index metric were inconclusive in this evaluation. Performance on standard functional correctness benchmarks (HumanEval+, MBPP+) was largely comparable to the base model. Nevertheless, expert user feedback confirmed the fine- tuned model’s utility as a practical AI companion for code refactoring to improve maintainability.

Karthik Shivashankar

University of Oslo

Antonio Martini

University of Oslo, Norway

Norway

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Tue 2 Dec
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

	09:30 - 11:00	Technical Debt and RefactoringShort Papers and Posters / Research Papers at Room 1

	09:30 15m Talk		Enhancing Python Code Maintainability through Large Language Model-Based Approaches Research Papers Karthik Shivashankar University of Oslo, Antonio Martini University of Oslo, Norway
	09:45 15m Talk		Enhancing Software Maintainability through LLM-Assisted Code Refactoring Research Papers Tommaso Fulcini Politecnico di Torino, Riccardo Coppola Politecnico di Torino, Flavio Giobergia Politecnico di Torino, Amirali Changizi Politecnico di Torino, Meelad Dashti Politecnico di Torino, Kimia Dorrani Politecnico di Torino, Domenico Amalfitano University of Naples Federico II, Damiano Distante UnitelmaSapienza University of Rome, Filippo Ricca DIBRIS, Università di Genova
	10:00 15m Talk		Temporal Evolution of Architectural Complexity and Technical Debt in Microservices: An Exploratory Case Study Research Papers Bhuwan Paudel Blekinge Institute of Technology, Javier Gonzalez-Huerta Blekinge Institute of Technology, Ehsan Zabardast Nordea / Blekinge Institute of Technology
	10:15 15m Talk		Detecting Technical Debt in Source Code Changes using Large Language Models Research Papers Merve Astekin SINTEF, Arda Goknil SINTEF Digital, Sagar Sen , Simeon Tverdal SINTEF Digital, Phu Nguyen SINTEF
	10:30 7m Talk		LLM-based Multi-Agent System for Intelligent Refactoring of Haskell Code Short Papers and Posters Shahbaz Siddeeq Tampere University, Muhammad Waseem Faculty of Information Technology and Communication Sciences, Tampere University, 33014 Tampere, Finland, Zeeshan Rasheed Tampere University, Md Mahade Hasan Tampere University, Jussi Rasku Tampere University, Mika Saari Tampere University, Henri Terho Eficode Oy, Kalle Mäkelä Eficode Oy, Kai-Kristian Kemell Tampere University, Pekka Abrahamsson Tampere University
	10:37 7m Talk		Architecture Degradation at Scale: Challenges and Insights from Practice Short Papers and Posters Ehsan Zabardast Nordea / Blekinge Institute of Technology, Bhuwan Paudel Blekinge Institute of Technology, Javier Gonzalez-Huerta Blekinge Institute of Technology
	10:44 7m Talk		How Well Small Language Models Can Be Adapted for Software Maintenance and Refactoring Tasks Short Papers and Posters Gabija Asvydyte University of Groningen, Sushant Kumar Pandey University of Groningen, The Netherlands, Sivajeet Chand Technical University of Munich