Enabling Architecture Traceability by LLM-based Architecture Component Name Extraction (ICSA 2025 - Research Papers)

Who

Dominik Fuchß, Haoyu Liu, Tobias Hey, Jan Keim, Anne Koziolek

Track

ICSA 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+02:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 2 Apr 2025 12:45 - 13:00 at Main Hall (O100) - AI and Machine Learning in Software Architecture I Chair(s): Henry Muccini

Abstract

Traceability Link Recovery (TLR) is an enabler for various software engineering tasks. One important task is the recovery of trace links between Software Architecture Documentation (SAD) and source code. Here, the main challenge is the semantic gap between the two artifact types. Recent research has shown that this semantic gap can be bridged by using Software Architecture Models (SAMs) as intermediates. However, the creation of SAMs is a manual and time-consuming task. This paper investigates the use of Large Language Models (LLMs) to extract component names as simple SAMs for TLR based on SAD and source code. By doing so, we aim to bridge the semantic gap between SAD and source code without the need for manual SAM creation. We compare our approach to the state-of-the-art TLR approaches TransArC and ArDoCode. TransArC is the currently best-performing approach for TLR between SAD and source code, but it requires SAMs as an additional artifact. Our evaluation shows that our approach performs comparable to TransArC (weighted average F1 with GPT-4o: 0.86 vs. TransArC’s 0.87), while only needing the SAD and source code. Moreover, our approach significantly outperforms the best baseline that does not need SAMs (weighted average F1 with GPT-4o: 0.86 vs. ArDoCode’s 0.62). In summary, our approach shows that LLMs can be used to make TLR between SAD and source code more applicable by extracting component names and omitting the need for manually created SAMs.

Link to Publication

https://fuchss.org/assets/pdf/2025/icsa-25.pdf

Dominik Fuchß

Karlsruhe Institute of Technology (KIT)

Germany

Haoyu Liu

Karlsruhe Institute of Technology (KIT)

Germany

Tobias Hey

Karlsruhe Institute of Technology (KIT)

Germany

Jan Keim

Karlsruhe Institute of Technology (KIT)

Germany

Anne Koziolek

Karlsruhe Institute of Technology

Germany

Website

Slides (PDF)

Time Zone

The program is currently displayed in (GMT+02:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+02:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 2 Apr
Displayed time zone: Brussels, Copenhagen, Madrid, Paris change

12:30 - 13:30	AI and Machine Learning in Software Architecture IResearch Papers / New and Emerging Ideas at Main Hall (O100) Chair(s): Henry Muccini University of L'Aquila, Italy

12:30 15m Research paper		LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World Research Papers Shrikara Arun , Meghana Tedla SERC, IIIT Hyderabad, India, Karthik Vaidhyanathan IIIT Hyderabad
12:45 15m Research paper		Enabling Architecture Traceability by LLM-based Architecture Component Name Extraction Research Papers Dominik Fuchß Karlsruhe Institute of Technology (KIT), Haoyu Liu Karlsruhe Institute of Technology (KIT), Tobias Hey Karlsruhe Institute of Technology (KIT), Jan Keim Karlsruhe Institute of Technology (KIT), Anne Koziolek Karlsruhe Institute of Technology Link to publication Media Attached
13:00 15m Paper		A Functional Software Reference Architecture for LLM-Integrated Systems New and Emerging Ideas Alessio Bucaioni Mälardalen University, Martin Weyssow DIRO, Université de Montréal, Junda He Singapore Management University, Yunbo Lyu Singapore Management University, David Lo Singapore Management University Pre-print
13:15 15m Research paper		Do Large Language Models Contain Software Architectural Knowledge? An Exploratory Case Study with GPT Research Papers Mohamed Soliman Paderborn University, Jan Keim Karlsruhe Institute of Technology (KIT)