EASE 2024
Tue 18 - Fri 21 June 2024 Salerno, Italy

Data quality is critical to make data-driven business decisions reliable, accurate, and complete. As a result, data quality assessment has become a prevalent topic among researchers and practitioners. Previous studies have surveyed data quality tools and used them in industrial projects. However, the practitioners’ adoption of data quality tools is mainly unexplored. In this study, we systematically selected the five widely used tools and analyzed 498 GitHub repositories that use those tools. Our findings show that practitioners increasingly use data quality tools to assess and improve the quality of their data. The most common use case is software development (49.2%), followed by learning (25.5%), teaching (9.2%), and research(0.4%). However, most repositories (69%) showed less activities and collaboration. The development projects, especially with industry-based owners, were the most active. The predominant use of the data quality tools was to assess the completeness of the data, followed by uniqueness, validity, and referential integrity.