Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

DataLink Record Linkage Software Applied to the Cancer Registry of Murcia, Spain

Authors: Chirlaque; Márquez Cid M; Navarro C;

DataLink Record Linkage Software Applied to the Cancer Registry of Murcia, Spain

Abstract

Summary Objectives: Record linkage between data sets is relatively simple when unique, universal, permanent, and common variables exist in each data set. This situation occurs infrequently; thus, there is a need to apply probabilistic methods to identify corresponding records. DataLink has been tested to determine if the use of clustering techniques will improve performance with a minimum decrease in accuracy. Methods: The study uses cancer registry data which includes hospital discharge and pathology reports from two hospitals in the Murcia Region for the years 2002-2003. These data are standardized prior to running DataLink. The original version of DataLink compares all of the records one by one, and in two later versions of the software clustering is applied which filters for one or more variables. Computing time and the proportion of detected matches have been investigated with each version. Results: The clustering versions achieve 96.1% and 96.2% accuracy, respectively. An improvement in the computational time of 97.3% and 98.6% is achieved for the two clustering versions compared with the original. The clustering versions lose 0.36% and 1.07% of real duplicates, respectively. Conclusions: DataLink implements deterministic and probabilistic record linkage to eliminate duplicates and to merge new information with existing cases. The standardization of variables to a common format has been adapted to the characteristics of Spanish language data. Clustering techniques minimize computational time and maximize accuracy in the detection of corresponding records.

Keywords

Spain, Data Collection, Neoplasms, Humans, Registries, Child, Software

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    7
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
7
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!