Deep-learning-based automated terminology mapping in OMOP-CDM

descriptionPublicationkeyboard_double_arrow_right Article 13 May 2021 English Publisher:Oxford University Press (OUP)Journal:Journal of the American Medical Informatics Association, volume 28, pages 1,489-1,496 (eissn: 1527-974X,

Copyright policy )

Authors: Byungkon Kang; Jisang Yoon; Ha Young Kim; Sung Jin Jo; Yourim Lee; Hye Jin Kam;

doi: 10.1093/jamia/ocab030

pmid: 33987667

pmc: PMC8279781

Deep-learning-based automated terminology mapping in OMOP-CDM

- Summary
- Subjects
- Metrics

Abstract

Abstract Objective Accessing medical data from multiple institutions is difficult owing to the interinstitutional diversity of vocabularies. Standardization schemes, such as the common data model, have been proposed as solutions to this problem, but such schemes require expensive human supervision. This study aims to construct a trainable system that can automate the process of semantic interinstitutional code mapping. Materials and Methods To automate mapping between source and target codes, we compute the embedding-based semantic similarity between corresponding descriptive sentences. We also implement a systematic approach for preparing training data for similarity computation. Experimental results are compared to traditional word-based mappings. Results The proposed model is compared against the state-of-the-art automated matching system, which is called Usagi, of the Observational Medical Outcomes Partnership common data model. By incorporating multiple negative training samples per positive sample, our semantic matching method significantly outperforms Usagi. Its matching accuracy is at least 10% greater than that of Usagi, and this trend is consistent across various top-k measurements. Discussion The proposed deep learning-based mapping approach outperforms previous simple word-level matching algorithms because it can account for contextual and semantic information. Additionally, we demonstrate that the manner in which negative training samples are selected significantly affects the overall performance of the system. Conclusion Incorporating the semantics of code descriptions more significantly increases matching accuracy compared to traditional text co-occurrence-based approaches. The negative training sample collection methodology is also an important component of the proposed trainable system that can be adopted in both present and future related systems.

Related Organizations

State University of New York at Potsdam
United States
Pohang University of Science and Technology
Korea (Republic of)
State University of New York
United States
Yonsei University
Korea (Republic of)

Keywords

Deep Learning, Humans, Algorithms, Language, Semantics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	17
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

17

Top 10%

hybrid

Fields of Science (3) View all

medical and health sciences

basic medicine

Fields of Science

medical and health sciences

basic medicine

View all