Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ arXiv.org e-Print Ar...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
arXiv.org e-Print Archive
Other literature type . Preprint . 2020
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Conference object . 2020
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Conference object . 2020
License: CC BY
Data sources: ZENODO
https://doi.org/10.48550/arxiv...
Article . 2020
License: arXiv Non-Exclusive Distribution
Data sources: Datacite
https://doi.org/10.18420/se202...
Other literature type . 2020
Data sources: Datacite
https://doi.org/10.1145/336543...
Conference object . 2020 . Peer-reviewed
Data sources: Crossref
versions View all 7 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Detecting quality problems in research data

a model-driven approach
Authors: Arno Kesper; Viola Wenz; Gabriele Taentzer;

Detecting quality problems in research data

Abstract

As scientific progress highly depends on the quality of research data, there are strict requirements for data quality coming from the scientific community. A major challenge in data quality assurance is to localise quality problems that are inherent to data. Due to the dynamic digitalisation in specific scientific fields, especially the humanities, different database technologies and data formats may be used in rather short terms to gain experiences. We present a model-driven approach to analyse the quality of research data. It allows abstracting from the underlying database technology. Based on the observation that many quality problems show anti-patterns, a data engineer formulates analysis patterns that are generic concerning the database format and technology. A domain expert chooses a pattern that has been adapted to a specific database technology and concretises it for a domain-specific database format. The resulting concrete patterns are used by data analysts to locate quality problems in their databases. As proof of concept, we implemented tool support that realises this approach for XML databases. We evaluated our approach concerning expressiveness and performance in the domain of cultural heritage based on a qualitative study on quality problems occurring in cultural heritage data.

Comment: 28 pages. This paper is an extended version of a paper to be published in "ACM/IEEE 23rd International Conference on Model Driven Engineering Languages and Systems (MODELS '20)". Added subtitle

Subjects by Vocabulary

Microsoft Academic Graph classification: Computer science media_common.quotation_subject computer.software_genre Domain (software engineering) Quality (business) Software analysis pattern media_common Scientific progress Data science Cultural heritage XML database Proof of concept Data quality computer

Keywords

FOS: Computer and information sciences, Model-driven development, Computer Science - Information Retrieval, Pattern matching, Data quality, Information Retrieval (cs.IR)

43 references, page 1 of 5

[1] Ziawasch Abedjan, Cuneyt Gurcan Akcora, Mourad Ouzzani, Paolo Papotti, and Michael Stonebraker. 2015. Temporal Rules Discovery for Web Data Cleaning. Proc. VLDB Endow. 9, 4 (2015), 336-347. https://doi.org/10.14778/2856318.2856328

[2] Serge Abiteboul. 1997. Querying Semi-Structured Data. In Database Theory - ICDT '97, 6th International Conference, Delphi, Greece, January 8-10, 1997, Proceedings (Lecture Notes in Computer Science), Foto N. Afrati and Phokion G. Kolaitis (Eds.), Vol. 1186. Springer, 1-18. https://doi.org/10.1007/3-540-62222-5_33 [OpenAIRE]

[3] Carlo Batini, Cinzia Cappiello, Chiara Francalanci, and Andrea Maurino. 2009. Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41, 3 (2009), 16:1-16:52. https://doi.org/10.1145/1541880.1541883 [OpenAIRE]

[4] Simran Bijral and Debajyoti Mukhopadhyay. 2014. Eficient Fuzzy Search Engine with B -Tree Search Mechanism. In 2014 International Conference on Information Technology, ICIT 2014, Bhubaneswar, India, December 22-24, 2014. IEEE, 118-122. https://doi.org/10.1109/ICIT.2014.19 [OpenAIRE]

[5] Paul V. Biron and Ashok Malhotra. 2004. XML Schema Part 2: Datatypes Second Edition. W3C Recommendation. W3C. http://www.w3.org/TR/2004/RECxmlschema-2-20041028/.

[6] Christian Bizer and Richard Cyganiak. 2009. Quality-driven information filtering using the WIQA policy framework. J. Web Semant. 7, 1 (2009), 1-10. https: //doi.org/10.1016/j.websem.2008.02.005 [OpenAIRE]

[7] Dario Bonino, Fulvio Corno, Laura Farinetti, and Alessio Bosca. 2004. Ontology driven semantic search. WSEAS Transaction on Information Science and Application 1, 6 (2004), 1597-1605.

[8] Jens Bove, Lutz Heusinger, and Angela Kailus. 2001. Marburger Informations-, Dokumentations- und Administrations-System (MIDAS): Handbuch und CD (Literatur und Archiv; 4). - 4. überarbeitete Auflage . https://archiv.ub.uni-heidelberg. de/artdok/3770/

[9] Erin Coburn, Richard Light, Gordon McKenna, Regine Stein, and Axel Vitzthum. [Online]. LIDO (Lightweight Information Describing Objects). http://network. icom.museum/cidoc/working-groups/lido/. http://network.icom.museum/cidoc/ working-groups/lido/

[10] C. J. Date and Hugh Darwen. 1997. A Guide to SQL Standard, 4th Edition. AddisonWesley.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 259
    download downloads 167
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    Powered byBIP!BIP!
  • 259
    views
    167
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
1
Average
Average
Average
259
167
Green