<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Finding biomedical categories in Medline®

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Oct 2012Publisher:Springer Science and Business Media LLCJournal:Journal of Biomedical Semantics, volume 3 (eissn: 2041-1480,

Authors: W. John Wilbur; Won Kim; Donald C. Comeau; Lana Yeganova;

doi: 10.1186/2041-1480-3-s3-s3

pmid: 23046816

pmc: PMC3465206

Finding biomedical categories in Medline®

- Summary
- Subjects
- Metrics

Abstract

Abstract Background There are several humanly defined ontologies relevant to Medline. However, Medline is a fast growing collection of biomedical documents which creates difficulties in updating and expanding these humanly defined ontologies. Automatically identifying meaningful categories of entities in a large text corpus is useful for information extraction, construction of machine learning features, and development of semantic representations. In this paper we describe and compare two methods for automatically learning meaningful biomedical categories in Medline. The first approach is a simple statistical method that uses part-of-speech and frequency information to extract a list of frequent nouns from Medline. The second method implements an alignment-based technique to learn frequent generic patterns that indicate a hyponymy/hypernymy relationship between a pair of noun phrases. We then apply these patterns to Medline to collect frequent hypernyms as potential biomedical categories. Results We study and compare these two alternative sets of terms to identify semantic categories in Medline. We find that both approaches produce reasonable terms as potential categories. We also find that there is a significant agreement between the two sets of terms. The overlap between the two methods improves our confidence regarding categories predicted by these independent methods. Conclusions This study is an initial attempt to extract categories that are discussed in Medline. Rather than imposing external ontologies on Medline, our methods allow categories to emerge from the text.

Related Organizations

National Institute of Health
Pakistan
National Institutes of Health
United States
United States National Library of Medicine
United States
National Institute of Health (NIH/NICHD)
United States

Keywords

Research, Computer applications to medicine. Medical informatics, R858-859.7

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

gold

Fields of Science (3) View all

medical and health sciences

basic medicine

Fields of Science

medical and health sciences

basic medicine

View all