
Text categorization means dividing a set of input documents into the two or more classes to which these documents belong. Because of increase in availability of data in digital form in large amount, it becomes necessary to organize it. Feature extraction is the crucial step in text classification. Most of the existing text classifiers are lacking in finding out the relations among the terms. We proposed a probabilistic approach for text classification in which the nonlinear relations among the terms are also considered. This model uses the domain ontology graph (DOG) with Markov clustering (MCL) algorithm. Here, ontology graph is constructed using DOG model and then clustering of ontology graph is done by MCL algorithm. This approach is scalable to huge dataset also and its classification power is not affected if relations among terms are large. Experimental results have shown that our system is 91% accurate for 8 categories and decreases, as we increase the classes from 8 to 10 and then to 12, from 91 to 88% and then to 85%, respectively. We have compared our classifier with existing Naive Bayes and k-Nearest Neighbor classifiers. Experimental results show that our proposed model is more accurate than these two classifiers. The better results demonstrated that our presented system is developed effectively.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 5 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
