Quality-Driven Machine Learning-based Data Science Pipeline Realization: a software engineering approach
The recently wide adoption of data science approaches to decision making in several application domains (such as health, business and even education) open new challenges in the engineering and implementation of these systems. Considering the big picture of data science, Machine learning is the wider used technique, and due to its characteristics, we believe that a better engineering methodology and tools are needed to realize innovative data-driven systems able to satisfy the emerging quality attributes (such as, debias and fariness, explainability, privacy and ethics, sustainability). This research project will explore the following three pillars: i) identify key quality attributes, formalize them in the context of data science pipelines and study their relationships; ii) define a new software engineering approach for data-science systems development that assures compliance with quality requirements; iii) implement tools that guide IT professionals and researchers in the realization of ML-based data science pipelines since the requirement engineering. Moreover, in this paper we also present some details of the project showing how the feature models and model-driven engineering can be leveraged to realize our project.
Giordano d’Aloisio is a Ph.D. student in Software engineering and intelligent systems at the University of L’Aquila, Italy. He is also a member of the Territori Aperti and PinKamP projects. He achieved a bachelor’s degree in Computer Science for Business Economy at the University of Chieti-Pescara and a master’s degree in Computer Science at the University of L’Aquila. He also has a master’s in Mobile and Web Technologies at the University of L’Aquila. His research is mainly focused on quality aspects of machine learning systems with particular attention on Bias and Fairness of machine learning algorithms.