Evidence-driven Data Requirements Engineering and Data Uncertainty Assessment of Machine Learning-based Safety-critical Systems
Reliance on data-centric machine learning (ML) models in complex systems has posed numerous challenges in the software engineering process, especially when the system is deployed in a high-risk environment. Due to the inherent uncertainty of such ML models, the safety assurance of these systems is now a primary concern. Recently, many researchers are focusing on assuring safe outcomes of ML models. However, not enough attention is paid to evaluating the training data uncertainty before indulging in ML training. Currently, there are no specific guidelines on how to perceive, elicit and specify data requirements and assure the data quality depending on the ML objective and problem domain. To address these gaps, this research provides guidelines for a systematic data requirements engineering and data uncertainty assessment process involving diverse stakeholders. A three-layered framework is proposed that helps to explore data space and elicit verifiable data requirements. Such requirements can facilitate the evaluation of the collective confidence of the experts in data quality. To accommodate epistemic uncertainty of such assessment (inconclusive due to lack of knowledge) Dempster Shafer’s theory of evidence is used. The application of this theory within the proposed framework aims to address the identified research gaps.