Evaluation of Data Quality in the Estonian National Health Information System for Digital Decision Support
Following the implementation of Electronic Medical Records (EMR), the amount of digital health data has increased significantly in recent decades. This trend creates an opportunity to share data between different healthcare parties for primary and secondary use. However, the quality of this data is often questioned, and data reuse is still rare. This study evaluates the frequency of the use and quality of health data stored in the Estonian Health Information System (EHIS), which is one of the most advanced digital health platforms (DHP) in the world. We collected usage data of the EHIS from its initial release in 2008 till 2021. Comparing 2016 to 2021, the number of documents per year pushed into the EHIS has nearly doubled. But also approximately nine times more patients and five times more health professionals queried data from the EHIS. This increase in read access indicates that both groups find valuable information from the system. To investigate this further, data from patients with common diseases like stroke, cancer, or diabetes have been queried, analyzed, and compared against the actual data needs from the point of healthcare professionals and natural persons. Contradictory to the claim mentioned above, the manual analysis of the queried data sometimes showed poor data quality and missing information, especially discrepancies between the structured and unstructured parts of the documents shared through DHP. As an example of varying data quality, we looked at how smoking behavior is reported, both in structured form and in free text form in the queried data. We analyzed how the data quality of smoking behavior data shifts from document to document using the nine data quality dimensions of the Data Quality Vector. The data quality is shown to shift in 7 dimensions. While humans seem to be able to screen the data and resolve inconsistencies effectively, the data quality issues present make data reuse for tasks like AI training for digital decision support systems challenging.