Improving Generalizability of ML-enabled Software through Domain Specification (CAIN 2022 - - 1st International Conference on AI Engineering - Software Engineering for AI)

Who

Hamed Barzamini, Mona Rahimi, Murtuza Shahzad, Hamed Alhoori

Track

CAIN 2022

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 17 May 2022 08:30 - 08:45 at CAIN main room - AI Models & Pipelines Chair(s): Lucy Ellen Lwakatare

Abstract

While the conventional software components implement pre-defined specifications, Machine Learning (ML)-enabled Software Components (MLSC) learn the domain specifications from the training samples. Thus, the MLSC’s data-driven and inductive reasoning become highly reliant on the quality of the training dataset, which is often arbitrarily collected in ad hoc manners. The random collection of samples leads to a significant gap between the actual specifications of a real-world concept, and the picture that a dataset represents of the concept, reducing MLSC generalizability, particularly in perceptual tasks where understanding the environment is an important factor of accurate prediction.

To fill the gap between the conceptualization of a targeted domain’s concept and its visualization in the MLSC training dataset, we propose exploiting semantic specification of the concept to identify the concepts’ missing variants in the dataset. To this end, we propose to first, semantically specify MLSC hard-to-specify targeted domain’s concepts and second, refer to the derived specifications to evaluate the diversity and relative completeness of MLSC collected datasets. The systematic augmentation of training datasets, with respect to the semantics of the domain, improves the quality of an arbitrarily collected dataset and potentially yields more reliable models. As a proof of concept, we automatically acquired the existing semantic knowledge for partially specifying the automotive domain concept \textit{``pedestrian.''} Referring to the derived specifications, we augmented the state-of-the-art pedestrian datasets. The evaluations show that semantic augmentation outperforms brute-force machine learning in satisfying the MLSC accuracy requirements.

Hamed Barzamini

Mona Rahimi

Northern Illinois University

United States

Murtuza Shahzad

Northern Illinois University

Hamed Alhoori

Northern Illinois University

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 17 May
Displayed time zone: Eastern Time (US & Canada) change

07:45 - 09:15	AI Models & PipelinesCAIN 2022 at CAIN main room Chair(s): Lucy Ellen Lwakatare University of Helsinki

07:45 15m Industry talk		Practical Insights of Repairing Model Problems on Image ClassificationIndustry Talk CAIN 2022 Akihito Yoshii Fujitsu Limited, Susumu Tokumoto Fujitsu Limited, Fuyuki Ishikawa National Institute of Informatics
08:00 15m Research paper		UDAVA: An Unsupervised Learning Pipeline for Sensor Data Validation in ManufacturingResearch Paper CAIN 2022 Erik Johannes Husom SINTEF Digital, Simeon Tverdal SINTEF Digital, Arda Goknil SINTEF Digital, Sagar Sen
08:15 15m Research paper		Black-Box Models for Non-Functional Properties of AI Software SystemsResearch Paper CAIN 2022 Daniel Friesel Universität Osnabrück, Olaf Spinczyk Universität Osnabrück DOI Pre-print
08:30 15m Research paper		Improving Generalizability of ML-enabled Software through Domain SpecificationResearch Paper CAIN 2022 Hamed Barzamini , Mona Rahimi Northern Illinois University, Murtuza Shahzad Northern Illinois University, Hamed Alhoori Northern Illinois University
08:45 15m Research paper		Data Sovereignty for AI Pipelines: Lessons Learned from an Industrial Project at Mondragon CorporationResearch Paper CAIN 2022 Marcel Altendeitering Fraunhofer ISST, Julia Pampus Fraunhofer ISST, Felix Larrinaga Mondragon Unibertsitatea, Jon Legaristi Mondragon Unibertsitatea, Falk Howar TU Dortmund University File Attached
09:00 15m Other		Discussion on AI Models & Pipelines CAIN 2022

Information for Participants

Tue 17 May 2022 07:45 - 09:15 at CAIN main room - AI Models & Pipelines Chair(s): Lucy Ellen Lwakatare

Info for room CAIN main room:

Click here to go to the room on Midspace