CADE: The Missing Benchmark in Evaluating Dataset Requirements of AI-enabled Software (Requirements Engineering 2022 - Research Papers)

Who

Mona Rahimi, Hamed Barzamini

Track

Requirements Engineering 2022 Research Papers

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 18 Aug 2022 21:40 - 22:10 at Wallaby - Requirements Engineering for AI Chair(s): Seok-Won Lee

Abstract

The inductive nature of artificial neural models makes dataset quality a key factor of their proper functionality. For this reason, multiple research studies proposed metrics to assess the quality of the models’ datasets, such as dataset correctness, completeness, and consistency. However, these studies commonly lack a point of reference against which the proposed quality metrics could be assessed.

To this end, this paper proposes a generic process that extracts the necessary knowledge to build a reliable reference point for the purpose of explanation, assessment, and augmentation of the AI-software dataset. This process automatically builds a benchmark specific to the software operational domain, interprets the training and validation datasets of AI-enabled software systems, and evaluates the dataset semantic quality and completeness relative to the benchmark. We implemented this process within a framework called Concept Augmentation and Dataset Evaluation (CADE), which leverages a series of novel natural language and image processing techniques to construct a semantic benchmark with respect to the domain specifications.

The application of CADE to three commonly-used autonomous driving datasets showed several common weaknesses present in the arbitrarily-collected datasets against the encoded domain specifications, demonstrating dataset divergence from the domain concepts and under-represented variances of the concepts in the data. The qualitative evaluation results showed an average of about 75% relevancy of CADE-generated topics.

Mona Rahimi

Northern Illinois University

United States

Hamed Barzamini