CAIN 2022
Mon 16 - Tue 17 May 2022
co-located with ICSE 2022
Tue 17 May 2022 08:30 - 08:45 at CAIN main room - AI Models & Pipelines Chair(s): Lucy Ellen Lwakatare

While the conventional software components implement pre-defined specifications, Machine Learning (ML)-enabled Software Components (MLSC) learn the domain specifications from the training samples. Thus, the MLSC’s data-driven and inductive reasoning become highly reliant on the quality of the training dataset, which is often arbitrarily collected in ad hoc manners. The random collection of samples leads to a significant gap between the actual specifications of a real-world concept, and the picture that a dataset represents of the concept, reducing MLSC generalizability, particularly in perceptual tasks where understanding the environment is an important factor of accurate prediction.

To fill the gap between the conceptualization of a targeted domain’s concept and its visualization in the MLSC training dataset, we propose exploiting semantic specification of the concept to identify the concepts’ missing variants in the dataset. To this end, we propose to first, semantically specify MLSC hard-to-specify targeted domain’s concepts and second, refer to the derived specifications to evaluate the diversity and relative completeness of MLSC collected datasets. The systematic augmentation of training datasets, with respect to the semantics of the domain, improves the quality of an arbitrarily collected dataset and potentially yields more reliable models. As a proof of concept, we automatically acquired the existing semantic knowledge for partially specifying the automotive domain concept \textit{``pedestrian.''} Referring to the derived specifications, we augmented the state-of-the-art pedestrian datasets. The evaluations show that semantic augmentation outperforms brute-force machine learning in satisfying the MLSC accuracy requirements.

Tue 17 May

Displayed time zone: Eastern Time (US & Canada) change

07:45 - 09:15
AI Models & PipelinesCAIN 2022 at CAIN main room
Chair(s): Lucy Ellen Lwakatare University of Helsinki
07:45
15m
Industry talk
Practical Insights of Repairing Model Problems on Image ClassificationIndustry Talk
CAIN 2022
Akihito Yoshii Fujitsu Limited, Susumu Tokumoto Fujitsu Limited, Fuyuki Ishikawa National Institute of Informatics
08:00
15m
Research paper
UDAVA: An Unsupervised Learning Pipeline for Sensor Data Validation in ManufacturingResearch Paper
CAIN 2022
Erik Johannes Husom SINTEF Digital, Simeon Tverdal SINTEF Digital, Arda Goknil SINTEF Digital, Sagar Sen
08:15
15m
Research paper
Black-Box Models for Non-Functional Properties of AI Software SystemsResearch Paper
CAIN 2022
Daniel Friesel Universität Osnabrück, Olaf Spinczyk Universität Osnabrück
DOI Pre-print
08:30
15m
Research paper
Improving Generalizability of ML-enabled Software through Domain SpecificationResearch Paper
CAIN 2022
Hamed Barzamini , Mona Rahimi Northern Illinois University, Murtuza Shahzad Northern Illinois University, Hamed Alhoori Northern Illinois University
08:45
15m
Research paper
Data Sovereignty for AI Pipelines: Lessons Learned from an Industrial Project at Mondragon CorporationResearch Paper
CAIN 2022
Marcel Altendeitering Fraunhofer ISST, Julia Pampus Fraunhofer ISST, Felix Larrinaga Mondragon Unibertsitatea, Jon Legaristi Mondragon Unibertsitatea, Falk Howar TU Dortmund University
File Attached
09:00
15m
Other
Discussion on AI Models & Pipelines
CAIN 2022


Information for Participants
Tue 17 May 2022 07:45 - 09:15 at CAIN main room - AI Models & Pipelines Chair(s): Lucy Ellen Lwakatare
Info for room CAIN main room:

Click here to go to the room on Midspace