Schema label normalization for improving schema matching

descriptionPublicationkeyboard_double_arrow_right Article 01 Dec 2010 Italy English Publisher:Elsevier BVJournal:Data & Knowledge Engineering, volume 69, pages 1,254-1,273 (issn: 0169-023X,

Copyright policy )

Authors: SORRENTINO, Serena; BERGAMASCHI, Sonia; GAWINECKI, MacieJ; PO, Laura;

doi: 10.1016/j.datak.2010.10.004

handle: 11380/646390

Schema label normalization for improving schema matching

- Summary
- Subjects
- Metrics

Abstract

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources that are heterogeneous in format and in structure. Starting from the "hidden meaning" associated with schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a "meaning" to schema labels. However, the performance of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns, abbreviations, and acronyms. We address this problem by proposing a method to perform schema label normalization which increases the number of comparable labels. The method semi-automatically expands abbreviations/acronyms and annotates compound nouns, with minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching results.

Country

Italy

Related Organizations

University of Modena and Reggio Emilia
Italy

Keywords

schema matching; normalization; natural language for DKE; lexical annotation; interoperability; heterogeneity

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	19
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%