Issues in learning an ontology from text

Article English OPEN
Brewster, Christopher ; Jupp, Simon ; Luciano, Joanne ; Shotton, David ; Stevens, Robert D ; Zhang, Ziqi (2009)
  • Publisher: BioMed Central
  • Journal: BMC Bioinformatics (vol: 10, pp: S1-S1)
  • Related identifiers: pmc: PMC2679401, doi: 10.1186/1471-2105-10-S5-S1
  • Subject: Molecular Biology | Biochemistry | Computer Science Applications | Proceedings

Ontology construction for any domain is a labour intensive and complex process. Any methodology that can reduce the cost and increase efficiency has the potential to make a major impact in the life sciences. This paper describes an experiment in ontology construction from text for the animal behaviour domain. Our objective was to see how much could be done in a simple and relatively rapid manner using a corpus of journal papers. We used a sequence of pre-existing text processing steps, and here describe the different choices made to clean the input, to derive a set of terms and to structure those terms in a number of hierarchies. We describe some of the challenges, especially that of focusing the ontology appropriately given a starting point of a heterogeneous corpus. Results - Using mainly automated techniques, we were able to construct an 18055 term ontology-like structure with 73% recall of animal behaviour terms, but a precision of only 26%. We were able to clean unwanted terms from the nascent ontology using lexico-syntactic patterns that tested the validity of term inclusion within the ontology. We used the same technique to test for subsumption relationships between the remaining terms to add structure to the initially broad and shallow structure we generated. All outputs are available at webcite. Conclusion - We present a systematic method for the initial steps of ontology or structured vocabulary construction for scientific domains that requires limited human effort and can make a contribution both to ontology learning and maintenance. The method is useful both for the exploration of a scientific domain and as a stepping stone towards formally rigourous ontologies. The filtering of recognised terms from a heterogeneous corpus to focus upon those that are the topic of the ontology is identified to be one of the main challenges for research in ontology learning.
  • References (45)
    45 references, page 1 of 5

    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, IsselTarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25:25-29.

    2. Dublin Core []

    3. Norton C, Sarkar IN, Leary P: uBio - Universal Biological Indexer and Organizer. Web Page 2009 [].

    4. Ecoregion [ eation.html]

    5. Animal Behavior Ontology 2006 [ ].

    6. Animal Behaviour Ontology Development Web page 2007 [ velopment]. [Part of the Ontogenesis Project wiki]

    7. Cimiano P, Pivk A, Schmidt-Thieme L, Staab S: Learning Taxonomic Relations from Heterogeneous Sources of Evidence. Ontology Learning from Text: Methods, Evaluation and Applications, Frontiers in Artificial Intelligence 2005 [ WBS/pci/OLP_Book_Cimiano.pdf]. IOS Press

    8. Brewster C, Iria J, Zhang Z, Ciravegna F, Guthrie L, Wilks Y: Dynamic Iterative Ontology Learning. Recent Advances in Natural Language Processing (RANLP 07), Borovets, Bulgaria 2007 [http://].

    9. Navigli R, Velardi P: Learning Domain Ontologies from Document Warehouses and Dedicated Websites. Computational Linguistics 2004, 30(2):151-179.

    10. Luciano JS, Stevens RD: e-Science and biological pathway semantics. BMC Bioinformatics 2007, 8(Suppl 3):S3.

  • Metrics
    No metrics available