Are We SOLID Yet? An Empirical Study on Prompting LLMs to Detect Design Principle Violations (ASE 2025 - New Ideas and Emerging Results Track)

Who

Fatih Pehlivan, Arçin Ülkü Ergüzen, Sahand Moslemi Yengejeh, Mayasah Lami, Anil Koyuncu

Track

ASE 2025 NIER Track

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 18 Nov 2025 16:30 - 16:40 at Grand Hall 5 - Maintenance & Evolution 3

Abstract

Traditional static analysis methods struggle to detect semantic design flaws, such as violations of the SOLID principles, which require a strong understanding of object-oriented design patterns and principles. Existing solutions typically focus on individual SOLID principles or specific programming languages, leaving a gap in the ability to detect violations across all five principles in multi-language codebases. This paper presents a new approach for this task: a methodology that leverages tailored prompt engineering to assess LLMs on their ability to detect SOLID violations across multiple languages (Java, Python, Kotlin, C#).

We present a systematic benchmark of four leading LLMs—CodeLlama:70B, DeepSeekCoder:33B, Qwen2.5 Coder:32B, and GPT-4o Mini—on their ability to detect violations of all five SOLID principles. To enable this evaluation, we construct a new benchmark dataset of 240 manually validated code examples. Using this dataset, we test four distinct prompt strategies inspired by established zero-shot, few-shot, and chain-of-thought techniques to systematically measure their impact on detection accuracy.

Our emerging results reveal a complex and surprising performance landscape. We find a stark hierarchy among models, with GPT-4o Mini decisively outperforming others, yet even it struggles with challenging principles like DIP. Crucially, we show that prompt strategy has a dramatic impact, but no single strategy is universally best; for instance, a deliberative ENSEMBLE prompt excels at OCP detection while a hint-based EXAMPLE prompt is superior for DIP violations. Across all experiments, detection accuracy is heavily influenced by language characteristics and degrades sharply with increasing code complexity. These initial findings demonstrate that effective, AI-driven design analysis requires not a single “best” model, but a tailored approach that matches the right model and prompt to the specific design context, highlighting the potential of LLMs to support maintainability through AI-assisted code analysis.

Fatih Pehlivan

Bilkent University

Arçin Ülkü Ergüzen

Bilkent University

Sahand Moslemi Yengejeh

Bilkent University

Mayasah Lami

Bilkent University

Turkey

Anil Koyuncu

Bilkent University

Turkey

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 18 Nov
Displayed time zone: Seoul change

16:00 - 17:00	Maintenance & Evolution 3Industry Showcase / NIER Track at Grand Hall 5

16:00 10m Talk		An Empirical Study on UI Overlap in OpenHarmony Applications Industry Showcase Farong Liu Beihang University, Mingyi Zhou Beihang University, Li Li Beihang University
16:10 10m Talk		Metrics Driven Reengineering and Continuous Code Improvement at Meta Industry Showcase Audris Mockus University of Tennessee, Peter C Rigby Meta / Concordia University, Rui Abreu Meta, Nachiappan Nagappan Meta Platforms, Inc.
16:20 10m Talk		Prompt-with-Me: in-IDE Structured Prompt Management for LLM-Driven Software Engineering Industry Showcase Ziyou Li Delft University of Technology, Agnia Sergeyuk JetBrains Research, Maliheh Izadi Delft University of Technology
16:30 10m Talk		Are We SOLID Yet? An Empirical Study on Prompting LLMs to Detect Design Principle Violations NIER Track Fatih Pehlivan Bilkent University, Arçin Ülkü Ergüzen Bilkent University, Sahand Moslemi Yengejeh Bilkent University, Mayasah Lami Bilkent University, Anil Koyuncu Bilkent University
16:40 10m Talk		Shrunk, Yet Complete: Code Shrinking-Resilient Android Third-Party Library Detection Industry Showcase Jingkun Zhang Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jingzheng Wu Institute of Software, The Chinese Academy of Sciences, Xiang Ling Institute of Software, Chinese Academy of Sciences, Tianyue Luo Institute of Software, Chinese Academy of Sciences, Bolin Zhou Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Mutian Yang Beijing ZhongKeWeiLan Technology Co.,Ltd.
16:50 10m Talk		LLM-Guided Genetic Improvement: Envisioning Semantic Aware Automated Software Evolution NIER Track Karine Even-Mendoza King’s College London, Alexander E.I. Brownlee University of Stirling, Alina Geiger Johannes Gutenberg University Mainz, Carol Hanna University College London, Justyna Petke University College London, Federica Sarro University College London, Dominik Sobania Johannes Gutenberg-Universität Mainz