
pmid: 24768598
Most of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaurus improves the results of a robust baseline when searching EHRs.The method uses a graph representation of the lexical units, concepts and relations in the UMLS Metathesaurus. It is based on random walks over the graph, which start on the query terms. Random walks are a well-studied discipline in both Web and Knowledge Base datasets.Our experiments over the TREC Medical Record track show improvements in both the 2011 and 2012 datasets over a strong baseline.Our analysis shows that the success of our method is due to the automatic expansion of the query with extra terms, even when they are not directly related in the UMLS Metathesaurus. The terms added in the expansion go beyond simple synonyms, and also add other kinds of topically related terms.Expansion of queries using related terms in the UMLS Metathesaurus beyond synonymy is an effective way to overcome the gap between query and document vocabularies when searching for patient cohorts.
Models, Statistical, Natural language processing, Information storage and retrieval, Health Informatics, Unified Medical Language System, Semantics, Computer Science Applications, Pattern Recognition, Automated, Artificial Intelligence, Data Interpretation, Statistical, Data Mining, Electronic Health Records, Computer Simulation, Data mining, Algorithms, Natural Language Processing
Models, Statistical, Natural language processing, Information storage and retrieval, Health Informatics, Unified Medical Language System, Semantics, Computer Science Applications, Pattern Recognition, Automated, Artificial Intelligence, Data Interpretation, Statistical, Data Mining, Electronic Health Records, Computer Simulation, Data mining, Algorithms, Natural Language Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 40 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
