ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Managing design-level complexity in industrial software systems remains challenging, often resulting in error-prone, difficult-to-maintain codebases. Despite extensive research on metrics and refactoring tools, architects frequently rely more on intuition than on algorithmic approaches, highlighting the need for approaches that better align with expert judgment. This research proposes that large language models (LLMs), trained specifically to generate a ``concern space,'' can organize program entities based on shared concerns, facilitating more meaningful metrics, refactoring suggestions, and system-level design views. Initial work with ConcernBERT, a purpose-trained LLM, shows significant advancements in representing cohesion over traditional concept-based methods. ConcernBERT uses a contrastive learning approach, where embeddings are learned by positioning entities addressing similar concerns close together while distancing unrelated ones. Complementing this, the Deicide algorithm identifies responsibility modules within classes, generating decomposition recommendations that align with historical maintenance patterns. Preliminary results are promising: ConcernBERT demonstrates strong performance in embedding entities by concern, closely aligning with expert-annotated ground truth. Future efforts will focus on applying these techniques across entire software systems.