Improving search over Electronic Health Records using UMLS-based query expansion through random walks

descriptionPublicationkeyboard_double_arrow_right Article 01 Oct 2014 English Publisher:Elsevier BVJournal:Journal of Biomedical Informatics, volume 51, pages 100-106 (issn: 1532-0464,

Copyright policy )Funded by:CHIST-ERA | READERS

Authors: David Martínez 0001; Arantxa Otegi; Aitor Soroa; Eneko Agirre;

doi: 10.1016/j.jbi.2014.04.013

pmid: 24768598

Improving search over Electronic Health Records using UMLS-based query expansion through random walks

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Most of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaurus improves the results of a robust baseline when searching EHRs.The method uses a graph representation of the lexical units, concepts and relations in the UMLS Metathesaurus. It is based on random walks over the graph, which start on the query terms. Random walks are a well-studied discipline in both Web and Knowledge Base datasets.Our experiments over the TREC Medical Record track show improvements in both the 2011 and 2012 datasets over a strong baseline.Our analysis shows that the success of our method is due to the automatic expansion of the query with extra terms, even when they are not directly related in the UMLS Metathesaurus. The terms added in the expansion go beyond simple synonyms, and also add other kinds of topically related terms.Expansion of queries using related terms in the UMLS Metathesaurus beyond synonymy is an effective way to overcome the gap between query and document vocabularies when searching for patient cohorts.

Related Organizations

Keywords

Models, Statistical, Natural language processing, Information storage and retrieval, Health Informatics, Unified Medical Language System, Semantics, Computer Science Applications, Pattern Recognition, Automated, Artificial Intelligence, Data Interpretation, Statistical, Data Mining, Electronic Health Records, Computer Simulation, Data mining, Algorithms, Natural Language Processing

1 Research products, page 1 of 1

negex software on Google Code
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	40
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%