Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Effect of feature selection method on the performance of focused crawlers—A case study on traditional and accelerated focused crawlers

Authors: R. Krishna Chaitanya; G.V Padma Raju; N.V.G. Sirisha Gadiraju;

Effect of feature selection method on the performance of focused crawlers—A case study on traditional and accelerated focused crawlers

Abstract

This paper mainly focuses on the effect of feature selection method on the performance of Traditional Focused Crawler (TFC) and Accelerated Focused Crawler (AFC). Information retrieval methods like querying a search engine, usage of web catalog and browsing may not satisfy the information needs of all the users. When information requirement is about a specific topic, focused crawlers will complement these methods. The aim of these crawlers is to download web pages that are highly relevant to the pre-defined topic. Naive Bayesian classifier is used to guide the crawlers by rating the web page before it is downloaded. For this analysis topics to be crawled are represented using a set of relevant documents. The features used by Bayesian Classifier in construction of the model are collected from the document corpus using Document Frequency and Information Gain feature selection methods. Performance of both the crawlers is evaluated when 500 features are selected using Document Frequency and Information Gain feature selection methods. Accelerated Focused Crawler's performance is evaluated for varied number of features gathered using both the feature selection methods. Target pages recall and Target description recall are used in evaluating the crawlers.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average
Upload OA version
Are you the author? Do you have the OA version of this publication?