Statistical distribution and fluctuations of sentence lengths in Ukrainian, Russian and English corpor


We have studied statistical distributions of the frequency of sentences over their length for Ukrainian, Russian and English corpora and found the average sentence lengths in terms of linguistic signs, letters and words. It has been shown that the tails of the statistical distributions are satisfactorily described by the exponential function or the related ones, which is consistent with random nature of the sentence length. We have proven that the fluctuations of the frequency of sentences of different lengths depend on the average values of that frequency according to the Taylor’s power law. Significant relative fluctuations of the frequency and the relative changes in the average sentence length confirm the importance of fluctuation phenomena in the statistical linguistics.

Authors and Affiliations

Oleg Kushnir, Viktor Dzikovskyi, Lyubomyr Ivanitskiy, Ivan Katerynchuk, Yaroslav Kis


Related Articles

Big Datamodels for E-commerce Systems

Generalized structural model of Big Data information resource for e-commerce systems developed in this paper. The analysis and substantiation of possibility and expediency of use of the Big Data in the processes of e-com...

Simulation of forest management in environmental and economic computer models

An overview of several global environmental computer models is presented in the article. This study focuses on the analysis of models' structure and the investigation of approaches to forest management modeling. Benefits...

Statistics of words occurrences in natural and random texts

We study experimentally statistical distributions that describe the appearance of words in a number of natural texts, as well as in the random texts derived on their basis. It is shown that the probability mass function...

Geoinformation Technology For Cloudiness Analysis On The Territory Of Western Ukraine Using Satellite Images

Based on Earth observation data taken from satellites of the Landsat program and using the capabilities of the cloud platform Google Earth Engine the geoinformation technology of spatial analysis of cloudiness in the ter...

The efficiency of using genetic algorithms to find optimized solutions

In the article the theoretical principles were justified, methodological and practical recommendations to enhance the effectiveness of the information system were proposed. The analysis of the basic models of genetic alg...

Download PDF file
  • EP ID EP577440
  • DOI -
  • Views 143
  • Downloads 0

How To Cite

Oleg Kushnir, Viktor Dzikovskyi, Lyubomyr Ivanitskiy, Ivan Katerynchuk, Yaroslav Kis (2016). Statistical distribution and fluctuations of sentence lengths in Ukrainian, Russian and English corpor. Vìsnik Nacìonalʹnogo unìversitetu "Lʹvìvsʹka polìtehnìka". Serìâ Ìnformacìjnì sistemi ta merežì, 854(), 228-239.