On the Use of GPT-4 in the Reverse Engineering of Class Diagrams (ECMFA 2025)

Who

Victor Campanello, Shariq Shahbaz, Vladislav Indykov, Daniel Strüber

Track

ECMFA 2025

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 11 Jun 2025 14:00 - 14:30 at M 001 - ECMFA Session 5: Maintenance Chair(s): Riccardo Rubei

Abstract

Class diagrams are a standard notation for effectively visualizing the structure of a software system in the context of software design and analysis. In particular, class diagrams are widely used in reverse engineering, the main goal of which is to reconstruct and analyze the design of a system from a given codebase to understand and improve it. Yet, traditional reverse engineering tools that generate class diagrams from code often produce cluttered outputs due to their inability to perform abstraction, that is, leaving out or summarizing nonessential elements in a way human experts would do.In this paper, we explore the use of large language models, specifically GPT-4, in generating class diagrams from code to emulate human abstraction. We used an experimental methodology in which we applied GPT-4 to a dataset of five substantial projects, comprising 4452 code elements and their expert-created abstraction to 338 model elements. Our prompts were informed by an in-depth manual analysis of the dataset, in which we identified stylistic choices that can lead to different generation outcomes and, therefore, are useful to include as hints into the prompt to reflect user preferences. To understand GPT-4’s inherent ability to abstract, we experimented with including hints from the Human Abstraction Framework (HAF), a previous systematization of human abstraction, into the prompts. Our results shed a promising light on the use of GPT-4 for making abstraction decisions at a fine level of granularity (e.g., the inclusion of attribute- and operation-level and type information), where mean F1 scores of 91% and 89% could be achieved, respectively, while more coarse-grained abstraction decisions (especially regarding the representation of relationships) lead to considerably worse F1 scores between 62% and 75%. The inclusion of HAF-based hints into prompts did not significantly affect accuracy, shedding a promising light on GPT-4s’s inherent abstraction ability. Our results emphasize the need for further research on understanding the handling of relationships during manual abstraction.

Link to Publication

https://www.jot.fm/contents/issue_2025_02/a14.html

DOI

https://doi.org/10.5381/jot.2025.24.2.a14

Victor Campanello

Chalmers University of Technology, University of Gothenburg

Sweden

Shariq Shahbaz

Chalmers University of Technology, University of Gothenburg

Sweden

Vladislav Indykov

Chalmers | University of Gothenburg

Sweden

Daniel Strüber

Chalmers | University of Gothenburg / Radboud University

Sweden

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 11 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

13:30 - 15:00	ECMFA Session 5: MaintenanceECMFA at M 001 Chair(s): Riccardo Rubei University of L'Aquila

13:30 30m Talk		Methodical and Formally Verified Model-Driven Architecture Refactoring ECMFA Lars Fischer Chair of Software Engineering, RWTH Aachen University, Hendrik Kausch RWTH Aachen University, Chair of Software Engineering, Bernhard Rumpe RWTH Aachen University, Max Stachon RWTH Aachen University, Sebastian Stüber RWTH Aachen University, Chair of Software Engineering, Lucas Wollenhaupt Chair of Software Engineering, RWTH Aachen University Link to publication DOI
14:00 30m Talk		On the Use of GPT-4 in the Reverse Engineering of Class Diagrams ECMFA Victor Campanello Chalmers University of Technology, University of Gothenburg, Shariq Shahbaz Chalmers University of Technology, University of Gothenburg, Vladislav Indykov Chalmers \| University of Gothenburg, Daniel Strüber Chalmers \| University of Gothenburg / Radboud University Link to publication DOI
14:30 30m Talk		Using MDE to support sustainable re-engineering ECMFA Dr Kevin Lano King's College London, Shekoufeh Rahimi University of Roehampton , Zishan Rahman King's College London Link to publication DOI