<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Sampling Search-Engine Results

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jan 2005 Italy English Publisher:Springer Science and Business Media LLCJournal:World Wide Web, volume 9, pages 397-429 (issn: 1386-145X, eissn: 1573-1413,

Authors: ANAGNOSTOPOULOS, ARISTIDIS; A. Z. BRODER; D. CARMEL;

doi: 10.1007/s11280-006-0222-z , 10.1145/1060745.1060784

handle: 11573/332444 , 11573/337348

Sampling Search-Engine Results

- Summary
- Subjects
- Metrics

Abstract

We consider the problem of efficiently sampling Web search engine query results. In turn, using a small random sample instead of the full set of results leads to efficient approximate algorithms for several applications, such as: Determining the set of categories in a given taxonomy spanned by the search results;Finding the range of metadata values associated to the result set in order to enable "multi-faceted search;"Estimating the size of the result set;Data mining associations to the query terms.We present and analyze an efficient algorithm for obtaining uniform random samples applicable to any search engine based on posting lists and document-at-a-time evaluation. (To our knowledge, all popular Web search engines, e.g. Google, Inktomi, AltaVista, AllTheWeb, belong to this class.)Furthermore, our algorithm can be modified to follow the modern object-oriented approach whereby posting lists are viewed as streams equipped with a next method, and the next method for Boolean and other complex queries is built from the next method for primitive terms. In our case we show how to construct a basic next(p) method that samples term posting lists with probability p, and show how to construct next(p) methods for Boolean operators (AND, OR, WAND) from primitive methods.Finally, we test the efficiency and quality of our approach on both synthetic and real-world data.

Country

Italy

Related Organizations

Keywords

Sampling; Search engines; WAND

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	52
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

Top 10%

Top 1%

Top 10%

bronze

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering