Quality Evaluation of ML-based Software Systems 2025
ML is a branch of AI focusing on systems able to make progressively better decisions or predictions by using data. Evaluating the quality of ML-based software is different from traditional software because the first, not only generally addresses distinct and more complex kinds of problems, but it deals with a different development paradigm as well. With the fast advance of ML technologies and related techniques, building high-quality ML-based software becomes a very prominent subject. Quality evaluation of traditional software products is a mature discipline, supported by sound techniques, effective methods and standards. The same cannot be said about ML-based software quality evaluation. The identification of quality attributes and related metrics, along with effective and objective measurement frameworks, for ML-based software, is a necessary step the software community is called to do. Evaluating quality of ML-based software systems is a way to pave the road to certification of such a kind of increasingly pervasive technologies, nowadays used also in critical contexts as safety-related applications. The workshop aims at gathering software quality experts and practitioners (not necessarily coming from the Software Engineering community) together to share experiences, new ideas, and solutions to face the challenges related to ML-based software quality evaluation.
QuEMaLeS Workshop Schedule (December 1, 2025):
14:00 Opening and Workshop Introduction
Paper Presentation:
14:15 G. Lami, F. Merola A Survey of Existing Standards Addressing AI-based Technologies
Session 1: ML Quality Evaluation from Process Perspective
Paper Presentations - Section 1:14:40 F. Falcini, G. Lami Critical Analysis of ASPICE® 4.0 Machine Learning Engineering Process Requirements
15:05 C. Donzella, G. Nicosia, F. Bella, I. Agirre, J. Fernandez, L. Belategi, J. Plazaola Alignment and complementarity between AI-FSM and ASPICE MLE: findings from the assessment of the SAFEXPLAIN Railway Demo
15:30 Coffee break
16:00 Session 1 - DiscussionSession 2: ML Quality Evaluation from Product Perspective
Paper Presentations - Section 2:16:20 L. Buglione, F. Merola Software Product Quality: Some Thoughts about its Evolution and Perspectives in the AI years
16:45 M. Szwarc, B. Walter Chatting about flaky tests with standard LLMs. An empirical exploration
17:10 Session 2 - Discussion
17:30 Workshop Closure
Accepted Papers
| Title | |
|---|---|
| Alignment and complementarity between AI-FSM and ASPICE MLE: findings from the assessment of the SAFEXPLAIN Railway Demo Quality Evaluation of ML-based Software Systems | |
| A Survey of Existing Standards Addressing AI-based Technologies Quality Evaluation of ML-based Software Systems | |
| Chatting about flaky tests with standard LLMs. An empirical exploration Quality Evaluation of ML-based Software Systems | |
| Critical Analysis of ASPICE® 4.0 Machine Learning Engineering Process Requirements Quality Evaluation of ML-based Software Systems | |
| Software Product Quality: Some Thoughts about its Evolution and Perspectives in the AI years Quality Evaluation of ML-based Software Systems |
Call for Papers
The motto “you can’t manage what you can’t measure” is still valid for ML-based software systems. The ever-increasing pervasiveness of ML-based solutions in the every-day life, makes the need of evaluating such solutions more and more urgent. The availability of effective, mature, feasible, and reliable methods to evaluate ML-based software systems is the basis to increase the control of ML-based software systems from the perspective of developers, users, and public authorities. With this premise, the workshop aims at allowing software quality experts and practitioners (not necessarily coming from the Software Engineering community), developers, researchers, managers and regulators to meet to discuss state-of-the-art approaches to ML-based software quality evaluation, and to identify novel approaches and innovative techniques.
The topics of interest include, but are not restricted to, the following:
● Quality models for ML-based software systems
● Quality metrics for training and validation data (including images for vision-based applications)
● Quality metrics for internal and external quality of AI-based software systems
● Non-functional measures of ML-based software systems
● Assessment of ML-software Development Process
● Certification of ML-based software processes and systems
● Quality case for ML-based software systems
● Educational challenges and needs for ML-based software quality assurance
● Data quality evaluation in the context of ML-based software
● Experiences of quantitative and qualitative quality evaluation of ML-based software systems
● Tools supporting ML-based software quality evaluation
● ISO/IEC 25059 and other standards and regulations for ML-based software product quality evaluation
● ML-based software quality evaluation in specific application domains (e.g. transportation, healthcare, …)
We invite submissions of full papers (max. 12 pages) covering all topics of interest of the workshop. Full papers should address research results, case studies and experience reports. Short papers (max. 8 pages) are also welcome. Short papers are designed to address ongoing activities and their preliminary results, as well as new ideas and innovative approaches. Papers (full and short) resulting from collaborations between industry and academia are especially welcome.
Submitted papers should comply with the Springer LNCS Author Guidelines. All papers must be written in English and submitted in PDF format through EasyChair. The EasyChair submission web page for all contributions is: https://easychair.org/my/conference?conf=quemales2025