Towards a highly-scalable and effective metasearch engine

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Apr 2001Publisher:ACMJournal:Proceedings of the 10th international conference on World Wide Web

Authors: Zonghuan Wu; Weiyi Meng; Clement T. Yu; Zhuogang Li;

doi: 10.1145/371920.372093

Towards a highly-scalable and effective metasearch engine

- Summary
- Metrics

Abstract

A metasearch engine is a system that supports uni ed access to multiple local search engines. Database selection is one of the main challenges in building a large-scale metasearch engine. The problem is to eAEciently and accurately determine a small number of potentially useful local search engines to invoke for each user query. In order to enable accurate selection, metadata that re ect the contents of each search engine need to be collected and used. In this paper, we propose a highly scalable and accurate database selection method. This method has several novel features. First, the metadata for representing the contents of all search engines are organized into a single integrated representative. Such a representative yields both computation eAEciency and storage eAEciency. Second, our selection method is based on a theory for ranking search engines optimally. Experimental results indicate that this new method is very e ective. An operational prototype system has been built based on the proposed approach.

Related Organizations

Binghamton University
United States
University of Illinois at Chicago
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	26
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

26

Average

Top 10%

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering