Using Distributional Similarity to Organise BioMedical Terminology

Article English OPEN
Weeds, Julie ; Dowdall, James ; Schneider, Gerold ; Keller, Bill ; Weir, David (2005)

We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy.
  • References (14)
    14 references, page 1 of 2

    Abney, S., 1995. Chunks and dependencies: Bringing processing evidence to bear on syntax. In: Cole, J., Green, G., Morgan, J. (Eds.), Computational Linguistics and the Foundations of Linguistic Theory. CSLI, pp. 145-164.

    Baker, C. F., Fillmore, C. J., Cronin, B., 2003. The structure of the framenet database. International Journal of Lexicography 16 (3), 281-296.

    Barker, K., Szpakowicz, S., August 10-14 1998. Semi-Automatic Recognition of Noun Modifier Relationships. In: Proc. of COLING-ACL98. Montreal, Quebec, Canada.

    Basili, R., Zanzotto, F., 2002. Parsing engineering and empirical robustness. Natural Language Engineering 8 (1), 21-37.

    Caraballo, S., 1999. Automatic construction of a hypernym-labelled noun hierarchy from text. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99). pp. 120-126.

    Castellvi, M. T. C., Bagot, R. E., Palatresi, J. V., 2001. Automatic term detection: A review of current systems. In: Bourigault, D., Jacquemin, C., L'Homme, M.-C. (Eds.), Recent Advances in Computational Terminology. John Benjamins, pp. 53-88.

    Chikashi Nobata, N. C., ichi Tsujii, J., 1999. Automatic term identification and classification in biology texts. In: Proceedings of the fifth Natural Language Processing Pacific Rim Symposium (NLPRS). Beijin, China, pp. 369-374.

    Church, K. W., Hanks, P., 1989. Word association norms, mutual information and lexicography. In: Proceedings of the 27th Annual Conference of the Association for Computational Linguistics (ACL-1989). pp. 76-82.

    Collins, M., 1996. A new statistical parser based on bigram lexical dependencies. In: Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics. Philadelphia, pp. 184-191.

    Collins, M., 1999. Head-driven statistical models for natural language processing. Ph.D. thesis, University of Pennsylvania.

  • Similar Research Results (2)
  • Metrics
    views in OpenAIRE
    views in local repository
    downloads in local repository

    The information is available from the following content providers:

    From Number Of Views Number Of Downloads
    Sussex Research Online - IRUS-UK 0 30
Share - Bookmark