D4.6 Definition of Data Quality Metrics

descriptionPublicationkeyboard_double_arrow_right Project deliverable , Other literature type 29 Feb 2024 English Publisher:ZenodoFunded by:EC | AIDAVA

Authors: Kalra, Dipak;

doi: 10.5281/zenodo.13758845 , 10.5281/zenodo.13758846

D4.6 Definition of Data Quality Metrics

- Summary
- Subjects
- Metrics

Abstract

Reusing poor quality data has limited value. When developing the requirements for the AIDAVAcuration virtual assistant, data users repeatedly asked the same question: how reliable the data is.The answer differs depending on the state of the data: i) for data sources, a quality label can beestablished based on the quality level provided by the data holder — if available — including thecredentials of the persons who created and validated the data; ii) for the curated data (i.e. the PHKG),the quality label will be linked to the quality from the source, the level of quality and certification ofthe curation tools used during transformation, the level of health and literacy of the humans whoprovided answers when there were semantic gaps, and the number of data quality checks that couldnot be resolved; iii) for published data, the quality label will be linked to the level of the curated data,the compliance with the target format, the completeness of the content, the absence of bias as wellas the quality, reliability and certification of the imputation algorithm, if applicable.This document provides a detailed overview of AIDAVA deliverable 4.6, focusing on data quality andmetadata across the health data life cycle. This deliverable serves as a key component in AIDAVA,aimed at developing a comprehensive data quality assessment methodology. This methodology iscrucial for ensuring the reliability, transparency, and effective reuse of health data. The documenthighlights the importance of maintaining high standards of health data quality and incorporates dataquality dimensions, methodologies, and tools. Furthermore, deliverable 4.6 is linked with otherintegral parts of the project, namely deliverables 1.3 (Business requirements for R1) [1], 1.4(Definition of assessment study including test scenarios & metrics, and study initiation package) [2] ,2.1 (Global data sharing standard) [3], and 2.2 (Details on data curation & publishing process)(deliverable on request). These deliverables introduce SHACL (Shapes Constraint Language) rules andspecific data quality guidelines, contributing for establishing data quality practices.

Related Organizations

Maastricht University
Netherlands

Keywords

data quality framework, data quality assessment, Data quality, secondary use, health data

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average