
Abstract Authorship Identification is the task of identifying who wrote a given piece of text from a given set of candidate authors (suspects). The increasingly large volumes of texts on the Internet enhance the great yet urgent necessity for authorship identification. For this purpose, a large amount of work has already been done for the English language. Comparatively, less research has been carried out for Indian regional languages such as Tamil, Telugu, Bengali and Punjabi whereas no such experiment is available for Marathi. In this study presented a strategy for authorship identification of the documents written in Marathi language. Moreover, we adopted a set of fine-grained lexical and stylistic features for the analysis of the text and used them to develop two different models (statistical similarity model and SMORDT-Sequential minimal optimization with rule- based Decision Tree approach). Then, we validated the feature extraction method to show consistent significance in every model used in this experiment. The performance of the proposed approach has been evaluated based on the values of Recall, Precision, F-measure and Accuracy.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 11 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
