CAIN 2022
Mon 16 - Tue 17 May 2022
co-located with ICSE 2022
Tue 17 May 2022 10:15 - 10:30 at CAIN main room - AI Smells Chair(s): Ipek Ozkaya, Thomas Zimmermann

High data quality is fundamental for today’s AI-based systems. However, although data quality has been an object of research for decades, there is a clear lack of research on potential data quality issues (e.g., ambiguous, extraneous values). These kinds of issues are latent in nature and thus often not obvious. Nevertheless, they can be associated with an increased risk of future problems in AI-based systems (e.g., technical debt, data-induced faults). As a counterpart to code smells in software engineering, we refer to such issues as Data Smells. This article conceptualizes data smells and elaborates on their causes, consequences, detection, and use in the context of AI-based systems. In addition, a catalogue of 36 data smells divided into three categories (i.e., Believability Smells, Understandability Smells, Consistency Smells) is presented. Moreover, the article outlines tool support for detecting data smells and presents the result of an initial smell detection on more than 240 real-world datasets.

Tue 17 May

Displayed time zone: Eastern Time (US & Canada) change

09:30 - 11:00
AI SmellsCAIN 2022 at CAIN main room
Chair(s): Ipek Ozkaya Carnegie Mellon Software Engineering Institute, Thomas Zimmermann Microsoft Research
09:30
30m
Other
Activity: Brainwriting
CAIN 2022

10:00
15m
Research paper
Code Smells for Machine Learning ApplicationsResearch Paper
CAIN 2022
Haiyin Zhang AI for Fintech Research, ING, Luís Cruz Deflt University of Technology, Arie van Deursen Delft University of Technology, Netherlands
Pre-print
10:15
15m
Research paper
Data Smells: Categories, Causes and Consequences, and Detection of Suspicious Data in AI-based SystemsResearch Paper
CAIN 2022
Harald Foidl University of Innsbruck, Michael Felderer University of Innsbruck, Rudolf Ramler Software Competence Center Hagenberg
Pre-print
10:30
15m
Research paper
Data smells in Public DatasetsResearch Paper
CAIN 2022
Arumoy Shome Delft University of Technology, Luís Cruz Deflt University of Technology, Arie van Deursen Delft University of Technology, Netherlands
Pre-print
10:45
15m
Other
Discussion on Smells in AI
CAIN 2022


Information for Participants
Tue 17 May 2022 09:30 - 11:00 at CAIN main room - AI Smells Chair(s): Ipek Ozkaya, Thomas Zimmermann
Info for room CAIN main room:

Click here to go to the room on Midspace