Development of the linguometric method for automatic identification of the author of text content based on statistical analysis of language diversity coefficients

Abstract

<p>We have developed the linguometric method for algorithmic support of content monitoring processes to solve the problem of the automatic identification of the author of the Ukrainian text content based on the technology of statistical analysis of the language diversity coefficients. The decomposition of the method for identification of the author based on the analysis of such speech factors as lexical diversity, degree (measure) of syntactic complexity, speech coherence, indexes of exclusivity and concentration of a text was performed. Such parameters of the author’s style as the number of words in the specified text, the total number of words in this text, the number of sentences, the number of prepositions, the number of conjunctions, the number of words with the frequency of 1, the number of words with the frequency of 10 and more were analyzed. The features of the developed methods are the adaptation of the morphological and syntactic analysis of lexical units to the peculiarities of the structures of Ukrainian words/texts. That is, when analyzing linguistic units of the word type, their belonging to a part of speech and declension within this part of speech was taken into account. For this, the flections of these words for their classification, separation of the base for the formation of the corresponding alphabetic-frequency dictionaries were analyzed. Filling these dictionaries was subsequently taken into consideration at the following stages of the identification of the authorship of a text, such as the calculation of parameters and coefficients of the author's speech. Syntactic words (stop or anchor) words are most essential for an individual style of an author, as they are not related to the subject and content of the publication. We compared the results in a set of 200 one-author papers in the technical area of more than 100 different authors over the period of 2001–2017 to determine if and how the coefficients of diversity of a text of these authors change within different periods of time. It was found that for the selected experimental base of more than 200 papers, the best results according to the density criterion are reached by the method for analysis of an article without the initial compulsory information, such as abstracts and keywords in different languages, as well as the list of literature.</p>

Authors and Affiliations

Vasyl Lytvyn, Victoria Vysotska, Petro Pukach, Zinovii Nytrebych, Ihor Demkiv, Roman Kovalchuk, Nadiia Huzyk

Keywords

Related Articles

A method developed to calculate lateral earth pressure on a sheet pile wall with counterforts

<p>A method has been developed for calculating the lateral earth pressure on a sheet pile wall with counterforts of various shapes – rectangular, trapezoidal with downward expansion, and trapezoidal with upward expansion...

Development of dimensionally stable structures of multilayer pipelines and cylindrical pressure vessels from carbon fiber reinforced plastic

<p>In the framework of the momentless theory of cylindrical thin shells, the elastic deformation of multilayer pipes and pressure vessels is investigated. It is assumed that the pipes and pressure vessels are made by two...

CFD modelling of particle size effect on stoker coal­fired boilers combustion

<p>In the previous study, CFD simulation had been developed to predict combustion characteristic on the Fluidized Bed Boiler and Pulverized Boiler. The high demand on coal used for stoker-fired boilers in Indonesia the p...

Influence of the thermal factor on the composition of electron­beam high­entropy ALTiVCrNbMo coatings

<p>This paper reports results of studying the element and phase compositions of electron-beam coatings based on the high-entropy alloy AlTiVCrNbMo, depending on the deposition temperature (in the range of 300...700 °С).<...

Development of formulations for sponge cakes made from organic raw materials using the principles of a food products safety management system

<p>To control the safety of sponge cakes made from organic raw materials in line with the HACCP principles, we have developed two sample sponge cakes "Winter delight" and "Exotic". To make the semi-finished sponge cake "...

Download PDF file
  • EP ID EP528152
  • DOI 10.15587/1729-4061.2018.142451
  • Views 71
  • Downloads 0

How To Cite

Vasyl Lytvyn, Victoria Vysotska, Petro Pukach, Zinovii Nytrebych, Ihor Demkiv, Roman Kovalchuk, Nadiia Huzyk (2018). Development of the linguometric method for automatic identification of the author of text content based on statistical analysis of language diversity coefficients. Восточно-Европейский журнал передовых технологий, 5(2), 16-28. https://europub.co.uk/articles/-A-528152