<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
This paper mainly focuses on the effect of feature selection method on the performance of Traditional Focused Crawler (TFC) and Accelerated Focused Crawler (AFC). Information retrieval methods like querying a search engine, usage of web catalog and browsing may not satisfy the information needs of all the users. When information requirement is about a specific topic, focused crawlers will complement these methods. The aim of these crawlers is to download web pages that are highly relevant to the pre-defined topic. Naive Bayesian classifier is used to guide the crawlers by rating the web page before it is downloaded. For this analysis topics to be crawled are represented using a set of relevant documents. The features used by Bayesian Classifier in construction of the model are collected from the document corpus using Document Frequency and Information Gain feature selection methods. Performance of both the crawlers is evaluated when 500 features are selected using Document Frequency and Information Gain feature selection methods. Accelerated Focused Crawler's performance is evaluated for varied number of features gathered using both the feature selection methods. Target pages recall and Target description recall are used in evaluating the crawlers.
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |