descriptionPublicationkeyboard_double_arrow_right Article 16 Mar 2017 English Publisher:WileyJournal:Statistics in Medicine, volume 36, pages 2,514-2,521 (issn: 0277-6715, eissn: 1097-0258,

Authors: Katie Harron; Mario Cortina-Borja; Harvey Goldstein; Harvey Goldstein;

doi: 10.1002/sim.7287

pmid: 28303597

pmc: PMC6205620

handle: 1983/683ee949-fe18-480b-a2d3-f3de85c0aa68

A scaling approach to record linkage

- Summary
- Subjects
- Metrics

Abstract

With increasing availability of large datasets derived from administrative and other sources, there is an increasing demand for the successful linking of these to provide rich sources of data for further analysis. Variation in the quality of identifiers used to carry out linkage means that existing approaches are often based upon ‘probabilistic’ models, which are based on a number of assumptions, and can make heavy computational demands. In this paper, we suggest a new approach to classifying record pairs in linkage, based upon weights (scores) derived using a scaling algorithm. The proposed method does not rely on training data, is computationally fast, requires only moderate amounts of storage and has intuitive appeal. Copyright © 2017 John Wiley & Sons, Ltd.

Related Organizations

University of London
United Kingdom
London School of Hygiene & Tropical Medicine
United Kingdom
University College London
United Kingdom
Australian Catholic University
Australia
London School of Hygiene & Tropical Medicine

View all View all

Keywords

Likelihood Functions, Models, Statistical, 330, scaling, Biostatistics, Applications of statistics to biology and medical sciences; meta analysis, correspondence analysis, record linkage, Humans, Medical Record Linkage, Algorithms, Software, data linkage

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	17
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%