Scalable Record Linkage

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Dec 2018Publisher:IEEEJournal:2018 IEEE International Conference on Big Data (Big Data)

Authors: Luke Wolcott; William Clements; Prasad Saripalli;

doi: 10.1109/bigdata.2018.8622516

Scalable Record Linkage

- Summary
- Metrics

Abstract

We present a record linkage solution that scales to big data volumes and velocities. Our method trains a Siamese deep learning network to encode records so that matching can be done fast and distributed. Compared to the current state-of-the-art methods using similarity functions and blocking, our solution links 100x the data in roughly 60% the time, with comparable precision and recall. We detail the design, training, and implementation of our method, and illustrate model and runtime performance results using a large US physician database and streaming data, implemented using keras/Tensorflow and Spark.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

3

Average

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now