descriptionPublicationkeyboard_double_arrow_right Article , Conference object 24 Mar 2004 English Publisher:WileyJournal:Software: Practice and Experience, volume 34, pages 711-726 (issn: 0038-0644, eissn: 1097-024X,

Authors: M. Santini; S. Vigna; P. Boldi; B. Codenotti;

doi: 10.1002/spe.587

handle: 20.500.14243/58351 , 20.500.14243/46159 , 2434/4768

UbiCrawler: a scalable fully distributed Web crawler

- Summary
- Metrics

Abstract

AbstractWe report our experience in implementing UbiCrawler, a scalable distributed Web crawler, using the Java programming language. The main features of UbiCrawler are platform independence, linear scalability, graceful degradation in the presence of faults, a very effective assignment function (based on consistent hashing) for partitioning the domain to crawl, and more in general the complete decentralization of every task. The necessity of handling very large sets of data has highlighted some limitations of the Java APIs, which prompted the authors to partially reimplement them. Copyright © 2004 John Wiley & Sons, Ltd.

Related Organizations

National Research Council
Italy
University of Iowa
United States
Consiglio Nazionale delle Ricerche - Istituto di Informatica e Telematica
Italy
University of Modena and Reggio Emilia
Italy
University of Milan
Italy

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	394
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 0.1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%