Authorship Attribution via Network Motifs Identification

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object , Other literature type 01 Oct 2016Embargo end date: 01 Jan 2016Publisher:IEEEJournal:2016 5th Brazilian Conference on Intelligent Systems (BRACIS)Funded by:NSERC | unidentified

Authors: Vanessa Q. Marinho; Graeme Hirst; Diego R. Amancio;

doi: 10.1109/bracis.2016.071 , 10.48550/arxiv.1607.06961

arXiv: 1607.06961

Authorship Attribution via Network Motifs Identification

- Summary
- Subjects
- Metrics

Abstract

Concepts and methods of complex networks can be used to analyse texts at their different complexity levels. Examples of natural language processing (NLP) tasks studied via topological analysis of networks are keyword identification, automatic extractive summarization and authorship attribution. Even though a myriad of network measurements have been applied to study the authorship attribution problem, the use of motifs for text analysis has been restricted to a few works. The goal of this paper is to apply the concept of motifs, recurrent interconnection patterns, in the authorship attribution task. The absolute frequencies of all thirteen directed motifs with three nodes were extracted from the co-occurrence networks and used as classification features. The effectiveness of these features was verified with four machine learning methods. The results show that motifs are able to distinguish the writing style of different authors. In our best scenario, 57.5% of the books were correctly classified. The chance baseline for this problem is 12.5%. In addition, we have found that function words play an important role in these recurrent patterns. Taken together, our findings suggest that motifs should be further explored in other related linguistic tasks.

Preprint submitted for the 5th Brazilian Conference on Intelligent Systems

Related Organizations

Universidade de São Paulo
Brazil
UNIVERSIDADE DE SAO PAULO
Brazil
University of Toronto
Canada
University of Toronto / Department of Computer Science
Canada

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Computation and Language (cs.CL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	16
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%