On the Use of GPT-4 in the Reverse Engineering of Class Diagrams
Class diagrams are a standard notation for effectively visualizing the structure of a software system in the context of software design and analysis. In particular, class diagrams are widely used in reverse engineering, the main goal of which is to reconstruct and analyze the design of a system from a given codebase to understand and improve it. Yet, traditional reverse engineering tools that generate class diagrams from code often produce cluttered outputs due to their inability to perform abstraction, that is, leaving out or summarizing nonessential elements in a way human experts would do.In this paper, we explore the use of large language models, specifically GPT-4, in generating class diagrams from code to emulate human abstraction. We used an experimental methodology in which we applied GPT-4 to a dataset of five substantial projects, comprising 4452 code elements and their expert-created abstraction to 338 model elements. Our prompts were informed by an in-depth manual analysis of the dataset, in which we identified stylistic choices that can lead to different generation outcomes and, therefore, are useful to include as hints into the prompt to reflect user preferences. To understand GPT-4’s inherent ability to abstract, we experimented with including hints from the Human Abstraction Framework (HAF), a previous systematization of human abstraction, into the prompts. Our results shed a promising light on the use of GPT-4 for making abstraction decisions at a fine level of granularity (e.g., the inclusion of attribute- and operation-level and type information), where mean F1 scores of 91% and 89% could be achieved, respectively, while more coarse-grained abstraction decisions (especially regarding the representation of relationships) lead to considerably worse F1 scores between 62% and 75%. The inclusion of HAF-based hints into prompts did not significantly affect accuracy, shedding a promising light on GPT-4s’s inherent abstraction ability. Our results emphasize the need for further research on understanding the handling of relationships during manual abstraction.
Wed 11 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
13:30 - 15:00 | |||
13:30 30mTalk | Methodical and Formally Verified Model-Driven Architecture Refactoring ECMFA Lars Fischer Chair of Software Engineering, RWTH Aachen University, Hendrik Kausch RWTH Aachen University, Chair of Software Engineering, Bernhard Rumpe RWTH Aachen University, Max Stachon RWTH Aachen University, Sebastian Stüber RWTH Aachen University, Chair of Software Engineering, Lucas Wollenhaupt Chair of Software Engineering, RWTH Aachen University Link to publication DOI | ||
14:00 30mTalk | On the Use of GPT-4 in the Reverse Engineering of Class Diagrams ECMFA Victor Campanello Chalmers University of Technology, University of Gothenburg, Shariq Shahbaz Chalmers University of Technology, University of Gothenburg, Vladislav Indykov Chalmers | University of Gothenburg, Daniel Strüber Chalmers | University of Gothenburg / Radboud University Link to publication DOI | ||
14:30 30mTalk | Using MDE to support sustainable re-engineering ECMFA Dr Kevin Lano King's College London, Shekoufeh Rahimi University of Roehampton , Zishan Rahman King's College London Link to publication DOI |