Searching with numbers

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jan 2002 English Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Knowledge and Data Engineering, volume 15, pages 855-870 (issn: 1041-4347,

Copyright policy )

Authors: Rakesh Agrawal 0001; Ramakrishnan Srikant;

doi: 10.1109/tkde.2003.1209004 , 10.1145/511499.511501 , 10.1145/511446.511501

Searching with numbers

- Summary
- Metrics

Abstract

A large fraction of the useful web comprises of specification documents that largely consist of hattribute name, numeric valuei pairs embedded in text. Examples include product information, classified advertisements, resumes, etc. The approach taken in the past to search these documents by first establishing correspondences between values and their names has achieved limited success because of the difficulty of extracting this information from free text. We propose a new approach that does not require this correspondence to be accurately established. Provided the data has "low reflectivity", we can do effective search even if the values in the data have not been assigned attribute names and the user has omitted attribute names in the query. We give algorithms and indexing structures for implementing the search. We also show how hints (i. e, imprecise, partial correspondences) from automatic data extraction techniques can be incorporated into our approach for better accuracy on high reflectivity datasets. Finally, we validate our approach by showing that we get high precision in our answers on real datasets from a variety of domains.

Related Organizations

IBM (United States)
United States
IBM Research - Almaden
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	22
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average