A Multi-Threaded Semantic Focused Crawler

descriptionPublicationkeyboard_double_arrow_right Article 01 Nov 2012 English Publisher:Springer Science and Business Media LLCJournal:Journal of Computer Science and Technology, volume 27, pages 1,233-1,242 (issn: 1000-9000, eissn: 1860-4749,

Authors: Anjali Thukral; Abhishek Behl; Hema Banati; Varun Mendiratta; Punam Bedi;

doi: 10.1007/s11390-012-1299-8

A Multi-Threaded Semantic Focused Crawler

- Summary
- Related research
  (1)
- Metrics

Abstract

The Web comprises of voluminous rich learning content. The volume of ever growing learning resources however leads to the problem of information overload. A large number of irrelevant search results generated from search engines based on keyword matching techniques further augment the problem. A learner in such a scenario needs semantically matched learning resources as the search results. Keeping in view the volume of content and significance of semantic knowledge, our paper proposes a multi-threaded semantic focused crawler (SFC) specially designed and implemented to crawl on the WWW for educational learning content. The proposed SFC utilizes domain ontology to expand a topic term and a set of seed URLs to initiate the crawl. The results obtained by multiple iterations of the crawl on various topics are shown and compared with the results obtained by executing an open source crawler on the similar dataset. The results are evaluated using Semantic Similarity, a vector space model based metric, and the harvest ratio.

Related Organizations

University of Delhi
India

1 Research products, page 1 of 1

crawler4j software on Google Code
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	18
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average