Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Eastern-European Jou...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Eastern-European Journal of Enterprise Technologies
Article . 2019 . Peer-reviewed
Data sources: Crossref
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Development of the quantitative method for automated text content authorship attribution based on the statistical analysis of N-grams distribution

Authors: Vasyl Lytvyn; Victoria Vysotska; Ihor Budz; Yaroslav Pelekh; Nataliia Sokulska; Roman Kovalchuk; Lyudmyla Dzyubyk; +2 Authors

Development of the quantitative method for automated text content authorship attribution based on the statistical analysis of N-grams distribution

Abstract

The peculiarities of the application of linguo-statistics technologies for the identification of the style of the author of text content of scientific and technical profile are considered. Quantitative linguistic analysis of a text uses the benefits of content monitoring based on the NLP methods to identify and analyze the set of stop words, keywords, set phrases and to study N-gram. The latter are used in the linguometry methods to determine in per cent if the given text belongs to a particular author. The quantitative method for automatic text content authorship attribution was developed based on statistical analysis of the 3-gram distribution. The approach to the implementation of identification of the author of the text in the Ukrainian language of the scientific and technical profile was proposed. Experimental results of the proposed method to determine the belonging of the analyzed text to a specific author in the presence of the reference text were obtained. Application of the linguo-statistical analysis of the 3-grams to a set of articles will make it possible to form a subset of publications that are similar in linguistic descriptions. Imposing additional conditions in the form of statistical and quantitative analyses (a set of keywords, set expressions, stylometric, linguometric analyses, etc.) on a subset will allow a significant reduction of this subset by specifying the list of the most likely author. For qualitative and effective content analysis when determining the degree of authorship of a particular author, we propose to analyze the reference text and the one under consideration at several stages: linguometric analysis of the coefficients of the diversity of the author's speech, stylometric analysis, analysis of set expressions, linguo-statistical analysis of 3-grams. For automated text processing, not only the frequency of occurrence of a certain category, but also its existence in the studied text in general are important. Quantitative computation makes it possible to draw objective conclusions about the orientation of materials by the number of using the units of analysis in the studied texts. Qualitative analysis does the same, but as a result of the study of whether (and in what context) there is a certain important original category in general

Keywords

UDC 004.89, NLP; content; content-monitoring; stop-words; content-analysis; statistical linguistic analysis; quantitative linguistics; statistical linguistics; linguometry, NLP; контент; контент-мониторинг; стоп-слова; контент-анализ; статистический лингвистический анализ; квантытативная лингвистика; статистическая лингвистика; лингвометрия, NLP; контент; контент-моніторінг; стоп-слова; контент-аналіз; статистийний лінгвістичний аналіз; квантитативна лінгвістика; статистична лінгвістика; лінгвометрія

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    6
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 3
    download downloads 10
  • 3
    views
    10
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
6
Top 10%
Top 10%
Average
3
10
Green
gold