Text categorization means dividing a set of input documents into the two or more classes to which these documents belong. Because of increase in availability of data in digital form in large amount, it becomes necessary to organize it. Feature extraction is the crucial step in text classification. Most of the existing text classifiers are lacking in finding out the relations among the terms. We proposed a probabilistic approach for text classification in which the nonlinear relations among the terms are also considered. This model uses the domain ontology graph (DOG) with Markov clustering (MCL) algorithm. Here, ontology graph is constructed using DOG model and then clustering of ontology graph is done by MCL algorithm. This approach is scalable to huge dataset also and its classification power is not affected if relations among terms are large. Experimental results have shown that our system is 91% accurate for 8 categories and decreases, as we increase the classes from 8 to 10 and then to 12, from 91 to 88% and then to 85%, respectively. We have compared our classifier with existing Naive Bayes and k-Nearest Neighbor classifiers. Experimental results show that our proposed model is more accurate than these two classifiers. The better results demonstrated that our presented system is developed effectively.

Related Organizations

Panjab University
India
University Institute of Engineering and Technology, Panjab University
India
GIET University
India

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now