Evaluating Multi-Modal LLMs for Automatically Recognizing Semantic Elements in UML Use Case Diagram Images
Requirements engineering commonly employs UML Use Case Diagrams (UCD) to visually capture system interactions and functionality, facilitating clear communication between stakeholders. Extracting semantic information from UCDs is essential for applications such as automated requirements extraction and system design validation. Recent advancements in large language models (LLMs) with visual processing capabilities enable interpreting intricate diagrammatic content. This paper evaluates multi-modal LLMs, specifically GPT-4o and GPT-4o-mini, in accurately identifying semantic elements within UCDs. We conducted experiments on a novel dataset of UCDs and other diagrams collected from online sources. Experimental results show that both models struggled to accurately identify and interpret key UCD elements, often misclassifying or overlooking essential ones.
Dr. Jameleddine Hassine is an Associate Professor at the department of Information and Computer Science of King Fahd University of Petroleum and Minerals (KFUPM). Dr. Hassine holds a Ph.D. from the Faculty of Engineering and Computer Science at Concordia University (2008) and a M.Sc. from the School of Information Technology and Engineering (SITE) at the University of Ottawa (2001). Prior to this, he earned a Computer Engineering Diploma from the National School of Computer Science (Tunis, Tunisia) (1997). Dr. Hassine has several years of industrial experience within world wide telecommunication companies: Nortel Networks (2000-2001) and Cisco Systems (2005-2010). His main research interests include requirements engineering (languages and methods), software testing, formal methods, communication protocols, and software maintenance. Dr. Hassine published his research in many high impact journals like Requirements Engineering Journal (REJ), Journal of Systems and Software (JSS), Information and Software Technology (IST), and Software and Systems Modeling (SoSyM).
