Quality Evaluation of ML-based Software Systems 2025

ML is a branch of AI focusing on systems able to make progressively better decisions or predictions by using data. Evaluating the quality of ML-based software is different from traditional software because the first, not only generally addresses distinct and more complex kinds of problems, but it deals with a different development paradigm as well. With the fast advance of ML technologies and related techniques, building high-quality ML-based software becomes a very prominent subject. Quality evaluation of traditional software products is a mature discipline, supported by sound techniques, effective methods and standards. The same cannot be said about ML-based software quality evaluation. The identification of quality attributes and related metrics, along with effective and objective measurement frameworks, for ML-based software, is a necessary step the software community is called to do. Evaluating quality of ML-based software systems is a way to pave the road to certification of such a kind of increasingly pervasive technologies, nowadays used also in critical contexts as safety-related applications. The workshop aims at gathering software quality experts and practitioners (not necessarily coming from the Software Engineering community) together to share experiences, new ideas, and solutions to face the challenges related to ML-based software quality evaluation.

QuEMaLeS Workshop Schedule (December 1, 2025):

14:00 Opening and Workshop Introduction

Paper Presentation:

14:15 G. Lami, F. Merola A Survey of Existing Standards Addressing AI-based Technologies

Session 1: ML Quality Evaluation from Process Perspective

Paper Presentations - Section 1:

14:40 F. Falcini, G. Lami Critical Analysis of ASPICE® 4.0 Machine Learning Engineering Process Requirements

15:05 C. Donzella, G. Nicosia, F. Bella, I. Agirre, J. Fernandez, L. Belategi, J. Plazaola Alignment and complementarity between AI-FSM and ASPICE MLE: findings from the assessment of the SAFEXPLAIN Railway Demo

15:30 Coffee break

16:00 Session 1 - Discussion

Session 2: ML Quality Evaluation from Product Perspective

Paper Presentations - Section 2:

16:20 L. Buglione, F. Merola Software Product Quality: Some Thoughts about its Evolution and Perspectives in the AI years

16:45 M. Szwarc, B. Walter Chatting about flaky tests with standard LLMs. An empirical exploration

17:10 Session 2 - Discussion

17:30 Workshop Closure

Accepted Papers

	Title
	Alignment and complementarity between AI-FSM and ASPICE MLE: findings from the assessment of the SAFEXPLAIN Railway Demo Quality Evaluation of ML-based Software Systems Carlo Donzella, Giuseppe Nicosia, Fabio Bella, Irune Agirre, Javier Fernandez, Lorea Belategi, Joanes Plazaola
	A Survey of Existing Standards Addressing AI-based Technologies Quality Evaluation of ML-based Software Systems Giuseppe Lami, Francesco Merola
	Chatting about flaky tests with standard LLMs. An empirical exploration Quality Evaluation of ML-based Software Systems Marcin Szwarc, Bartosz Walter
	Critical Analysis of ASPICE® 4.0 Machine Learning Engineering Process Requirements Quality Evaluation of ML-based Software Systems Fabio Falcini, Giuseppe Lami
	Software Product Quality: Some Thoughts about its Evolution and Perspectives in the AI years Quality Evaluation of ML-based Software Systems Luigi Buglione, Francesco Merola

Call for Papers

The motto “you can’t manage what you can’t measure” is still valid for ML-based software systems. The ever-increasing pervasiveness of ML-based solutions in the every-day life, makes the need of evaluating such solutions more and more urgent. The availability of effective, mature, feasible, and reliable methods to evaluate ML-based software systems is the basis to increase the control of ML-based software systems from the perspective of developers, users, and public authorities. With this premise, the workshop aims at allowing software quality experts and practitioners (not necessarily coming from the Software Engineering community), developers, researchers, managers and regulators to meet to discuss state-of-the-art approaches to ML-based software quality evaluation, and to identify novel approaches and innovative techniques.

The topics of interest include, but are not restricted to, the following:

● Quality models for ML-based software systems

● Quality metrics for training and validation data (including images for vision-based applications)

● Quality metrics for internal and external quality of AI-based software systems

● Non-functional measures of ML-based software systems

● Assessment of ML-software Development Process

● Certification of ML-based software processes and systems

● Quality case for ML-based software systems

● Educational challenges and needs for ML-based software quality assurance

● Data quality evaluation in the context of ML-based software

● Experiences of quantitative and qualitative quality evaluation of ML-based software systems

● Tools supporting ML-based software quality evaluation

● ISO/IEC 25059 and other standards and regulations for ML-based software product quality evaluation

● ML-based software quality evaluation in specific application domains (e.g. transportation, healthcare, …)

We invite submissions of full papers (max. 12 pages) covering all topics of interest of the workshop. Full papers should address research results, case studies and experience reports. Short papers (max. 8 pages) are also welcome. Short papers are designed to address ongoing activities and their preliminary results, as well as new ideas and innovative approaches. Papers (full and short) resulting from collaborations between industry and academia are especially welcome.

Submitted papers should comply with the Springer LNCS Author Guidelines. All papers must be written in English and submitted in PDF format through EasyChair. The EasyChair submission web page for all contributions is: https://easychair.org/my/conference?conf=quemales2025

Questions? Use the Quality Evaluation of ML-based Software Systems contact form.

Quality Evaluation of ML-based Software Systems 2025

QuEMaLeS Workshop Schedule (December 1, 2025):

Session 1: ML Quality Evaluation from Process Perspective

15:30 Coffee break

Session 2: ML Quality Evaluation from Product Perspective

17:30 Workshop Closure

Accepted Papers

Call for Papers

Luigi Buglione

DXC Technology

Italy

Fabio Falcini

Giuseppe Lami

Institute of Information Science and Technologies "A. Faedo" - CNR

Italy

Francesco Merola

ISTI-CNR

Italy

Alain Abran

Ecole de Technologie Superieure

Canada

Christian Berger

Chalmers University of Technology, Sweden

Jaganmohan Chandrasekaran

Virginia Tech

United States

Maya Daneva

University of Twente

Netherlands

John Favaro

Trust-IT, Pisa

Italy

Ayça Kolukısa Tarhan

Hacettepe University, Computer Engineering

Turkey

Tracks

Workshops