Attribute classification using feature analysis

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 25 Jun 2003Publisher:IEEE Comput. SocJournal:Proceedings 18th International Conference on Data Engineering

Authors: Felix Naumann; Ching-Tien Ho; Xuqing Tian; Laura M. Haas; Nimrod Megiddo;

doi: 10.1109/icde.2002.994725

Attribute classification using feature analysis

- Summary
- Metrics

Abstract

The basis of many systems that integrate data from multiple sources is a set of correspondences between source schemata and a target schema. Correspondences express a relationship between sets of source attributes, possibly from multiple sources, and a set of target attributes. Clio is an integration tool that assists users in defining value correspondences between attributes. In real life scenarios there may be many sources and the source relations may have many attributes. Users can get lost and might miss or be unable to find some correspondences. Also, in many real life schemata the attribute names reveal little or nothing about the semantics of the data values. Only the data values in the attribute columns can convey the semantic meaning of the attribute. Our work relieves users of the problems of too many attributes and meaningless attribute names, by automatically suggesting correspondences between source and target attributes. For each attribute, we analyze the data values and derive a set of features.

Related Organizations

IBM Research - Almaden
United States
IBM (United States)
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	12
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

12

Average

Top 10%

Average

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering