Categorical linkage‐data analysis

descriptionPublicationkeyboard_double_arrow_right Article 10 Jun 2024 United Kingdom English Publisher:WileyJournal:Statistics in Medicine, volume 43, pages 3,463-3,483 (issn: 0277-6715, eissn: 1097-0258,

Copyright policy )

Authors: Li‐Chun Zhang; Tiziana Tuoto;

doi: 10.1002/sim.10134

pmid: 38853711

Categorical linkage‐data analysis

- Summary
- Subjects
- Metrics

Abstract

Analysis of integrated data often requires record linkage in order to join together the data residing in separate sources. In case linkage errors cannot be avoided, due to the lack a unique identity key that can be used to link the records unequivocally, standard statistical techniques may produce misleading inference if the linked data are treated as if they were true observations. In this paper, we propose methods for categorical data analysis based on linked data that are not prepared by the analyst, such that neither the match‐key variables nor the unlinked records are available. The adjustment is based on the proportion of false links in the linked file and our approach allows the probabilities of correct linkage to vary across the records without requiring that one is able to estimate this probability for each individual record. It accommodates also the general situation where unmatched records that cannot possibly be correctly linked exist in all the sources. The proposed methods are studied by simulation and applied to real data.

Country

United Kingdom

Related Organizations

National Institute of Statistics
Italy
University of Southampton
United Kingdom

Keywords

Models, Statistical, 330, logistic regression, heterogeneous linkage error, Applications of statistics to biology and medical sciences; meta analysis, linkage data structure, analysis of contingency table, Data Interpretation, Statistical, incomplete match space, Humans, Computer Simulation, Medical Record Linkage, secondary analysis, Probability

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

hybrid

Fields of Science (4) View all

Fields of Science