Author Identification using Sequential Minimal Optimization with rule-based Decision Tree on Indian Literature in Marathi

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2018 English Publisher:Elsevier BVJournal:Procedia Computer Science, volume 132, pages 1,086-1,101 (issn: 1877-0509,

Copyright policy )

Authors: Kale Sunil Digamberrao; Rajesh S. Prasad;

doi: 10.1016/j.procs.2018.05.024

Author Identification using Sequential Minimal Optimization with rule-based Decision Tree on Indian Literature in Marathi

- Summary
- Metrics

Abstract

Abstract Authorship Identification is the task of identifying who wrote a given piece of text from a given set of candidate authors (suspects). The increasingly large volumes of texts on the Internet enhance the great yet urgent necessity for authorship identification. For this purpose, a large amount of work has already been done for the English language. Comparatively, less research has been carried out for Indian regional languages such as Tamil, Telugu, Bengali and Punjabi whereas no such experiment is available for Marathi. In this study presented a strategy for authorship identification of the documents written in Marathi language. Moreover, we adopted a set of fine-grained lexical and stylistic features for the analysis of the text and used them to develop two different models (statistical similarity model and SMORDT-Sequential minimal optimization with rule- based Decision Tree approach). Then, we validated the feature extraction method to show consistent significance in every model used in this experiment. The performance of the proposed approach has been evaluated based on the values of Recall, Precision, F-measure and Accuracy.

Related Organizations

Pune Institute of Computer Technology
India
Sinhgad Institute of Technology
India

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	11
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average