ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil

Understanding the architecture is crucial for effectively maintaining and managing large software systems. However, discrepancies often exist between the designed and implemented architectures, which can pose significant risks. To identify these discrepancies, architects need to extract the architecture from the system implementation, which is both time-consuming and error-prone. To simplify this procedure, many automatic architecture recovery techniques have been developed. Yet, their accuracy is often limited. Architects must still invest significant effort in refining recovery results to ensure they accurately reflect the implemented architecture.

To reduce such manual effort, we introduce SemRef, a framework that combines LLMs with dependency analysis to automatically refine architectures recovered by existing architecture recovery tools. By leveraging the LLM’s semantic understanding capabilities and integrating structural dependencies, SemRef enhances both the accuracy and the comprehension of recovered architectures. To evaluate SemRef, we tested on 9 projects with published ground-truth architectures and 10 state-of-the-art architecture recovery tools. 5 commonly used metrics are adopted to evaluate the effectiveness of SemRef.

The results show that SemRef improves accuracy across various metrics, with normalized gains ranges from 17.72% to 43.35%. Specifically, for MoJoFM and $a2a_{adj}$ metrics, SemRef achieves relative improvements of 118.57% and 100.41%, respectively. Moreover, SemRef is highly scalable. It maintains stable performance across projects ranging from thousands to trillions of lines of code with the cost scale linearly with project size. Further, we test SemRef on various LLMs to demonstrate its generalizability across different models. Beyond improving accuracy, the integration of LLMs enables SemRef to provide a structured module hierarchy and hierarchical module summaries, which further enhance the comprehensibility of recovered architectures.