Are We SOLID Yet? An Empirical Study on Prompting LLMs to Detect Design Principle Violations
This program is tentative and subject to change.
Traditional static analysis methods struggle to detect semantic design flaws, such as violations of the SOLID principles, which require a strong understanding of object-oriented design patterns and principles. Existing solutions typically focus on individual SOLID principles or specific programming languages, leaving a gap in the ability to detect violations across all five principles in multi-language codebases. This paper presents a new approach for this task: a methodology that leverages tailored prompt engineering to assess LLMs on their ability to detect SOLID violations across multiple languages (Java, Python, Kotlin, C#).
We present a systematic benchmark of four leading LLMs—CodeLlama:70B, DeepSeekCoder:33B, Qwen2.5 Coder:32B, and GPT-4o Mini—on their ability to detect violations of all five SOLID principles. To enable this evaluation, we construct a new benchmark dataset of 240 manually validated code examples. Using this dataset, we test four distinct prompt strategies inspired by established zero-shot, few-shot, and chain-of-thought techniques to systematically measure their impact on detection accuracy.
Our emerging results reveal a complex and surprising performance landscape. We find a stark hierarchy among models, with GPT-4o Mini decisively outperforming others, yet even it struggles with challenging principles like DIP. Crucially, we show that prompt strategy has a dramatic impact, but no single strategy is universally best; for instance, a deliberative ENSEMBLE prompt excels at OCP detection while a hint-based EXAMPLE prompt is superior for DIP violations. Across all experiments, detection accuracy is heavily influenced by language characteristics and degrades sharply with increasing code complexity. These initial findings demonstrate that effective, AI-driven design analysis requires not a single “best” model, but a tailored approach that matches the right model and prompt to the specific design context, highlighting the potential of LLMs to support maintainability through AI-assisted code analysis.
This program is tentative and subject to change.
Tue 18 NovDisplayed time zone: Seoul change
16:00 - 17:00 | |||
16:00 10mTalk | An Empirical Study on UI Overlap in OpenHarmony Applications Industry Showcase | ||
16:10 10mTalk | Metrics Driven Reengineering and Continuous Code Improvement at Meta Industry Showcase Audris Mockus University of Tennessee, Peter C Rigby Meta / Concordia University, Rui Abreu Meta, Nachiappan Nagappan Meta Platforms, Inc. | ||
16:20 10mTalk | Prompt-with-Me: in-IDE Structured Prompt Management for LLM-Driven Software Engineering Industry Showcase Ziyou Li Delft University of Technology, Agnia Sergeyuk JetBrains Research, Maliheh Izadi Delft University of Technology | ||
16:30 10mTalk | Are We SOLID Yet? An Empirical Study on Prompting LLMs to Detect Design Principle Violations NIER Track Fatih Pehlivan Bilkent University, Arçin Ülkü Ergüzen Bilkent University, Sahand Moslemi Yengejeh Bilkent University, Mayasah Lami Bilkent University, Anil Koyuncu Bilkent University | ||
16:40 10mTalk | Shrunk, Yet Complete: Code Shrinking-Resilient Android Third-Party Library Detection Industry Showcase Jingkun Zhang Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jingzheng Wu Institute of Software, The Chinese Academy of Sciences, Xiang Ling Institute of Software, Chinese Academy of Sciences, Tianyue Luo Institute of Software, Chinese Academy of Sciences, Bolin Zhou Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Mutian Yang Beijing ZhongKeWeiLan Technology Co.,Ltd. | ||
16:50 10mTalk | LLM-Guided Genetic Improvement: Envisioning Semantic Aware Automated Software Evolution NIER Track Karine Even-Mendoza King’s College London, Alexander E.I. Brownlee University of Stirling, Alina Geiger Johannes Gutenberg University Mainz, Carol Hanna University College London, Justyna Petke University College London, Federica Sarro University College London, Dominik Sobania Johannes Gutenberg-Universität Mainz | ||