Detecting Quality Problems in Research Data: A Model-Driven Approach
As scientific progress highly depends on the quality of research data, there are strict requirements for data quality coming from the scientific community. A major challenge in data quality assurance is to localise quality problems that are inherent to data collections. In this paper, we present the results of a qualitative study on quality problems occurring in cultural heritage data. To cope with the dynamic digitalisation of the humanities, we present a model-driven approach to analyse the quality of research data. It allows abstracting from the underlying database technology. Based on the observation that many of the identified quality problems show anti-patterns, a data engineer formulates analysis patterns that are generic concerning the database format and technology. A domain expert chooses a pattern that has been adapted to a specific database technology and concretises it for a domain-specific database format. The resulting concrete patterns are used by data analysts to locate quality problems in their databases. As a proof of concept, we implemented tool support that realises this approach for XML databases. We evaluated our approach concerning expressiveness and performance.
Thu 22 Oct Times are displayed in time zone: Eastern Time (US & Canada) change
|15:00 - 15:20|
Technical TrackLink to publication DOI File Attached
|15:20 - 15:35|
|15:35 - 15:50|
|15:50 - 16:05|