Effective Web Page Crawler

Hilal Hadi Saleh; Israa Ali

Found an issue? Give us feedback

Engineering and Tech...arrow_drop_down

Engineering and Technology Journal

Article . 2011 . Peer-reviewed

Data sources: Crossref

Engineering and Technology Journal

Article . 2011

Data sources: DOAJ

Effective Web Page Crawler

descriptionPublicationkeyboard_double_arrow_right Article 01 Feb 2011 English Publisher:University of TechnologyJournal:Engineering and Technology Journal, volume 29, pages 513-530 (eissn: 2412-0758,

Copyright policy )

Authors: Hilal Hadi Saleh; Israa Ali;

doi: 10.30684/etj.29.3.9

Effective Web Page Crawler

- Summary
- Subjects
- Metrics

Abstract

The World Wide Web (WWW) has grown from a few thousand pages in 1993 to more than eight billion pages at present. Due to this explosion in size, web search engines are becoming increasingly important as the primary means of locating relevant information. This research aims to build a crawler that crawls the most important web pages, a crawling system has been built which consists of three main techniques. The first is Best-First Technique which is used to select the most important page. The second is Distributed Crawling Technique which based on UbiCrawler. It is used to distribute the URLs of the selected web pages to several machines. And the third is Duplicated Pages Detecting Technique by using a proposed document fingerprint algorithm.

Related Organizations

University of Technology
Russian Federation

Keywords

Technology, and fingerprint, Science, T, Q, search engine, web crawl

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold