Statistics of words occurrences in natural and random texts

Abstract

We study experimentally statistical distributions that describe the appearance of words in a number of natural texts, as well as in the random texts derived on their basis. It is shown that the probability mass function of the respective intervals between words is practically the same for the natural and random texts and manifests a fat tail, which is inconsistent with purely stochastic character of those intervals. Significant deviations of the vocabulary growth dynamics found for the natural and random texts from the dynamics predicted by the power Heaps’ law, together with a crossover found in the dictionary of one of the natural texts, confirm a need in generalization of that law.

Authors and Affiliations

Oleg Kushnir, Mykola Alfavitskyi, Viktor Dzikovskyi, Lyubomyr Ivanitskyi, Sergiy Rykhlyuk, Volodymyr Sokulskyi

Keywords

Related Articles

Sequential kernel fuzzy clustering of big data based on computational intelligence hybrid system

The architecture and self-learning method of hybrid neuro-fuzzy systems for big fuzzy clustering in on-line mode are proposed in this paper. The architecture of proposed system represents the hybrid of the fuzzy general...

The thesaurus of "the smart city" subject area

The article deals with the formation of the domain thesaurus “Smart City”, presents the interpretation of the lexicographical systems based on the theory V. A. Shirokov, describes the method of designing the ontology pro...

Information Technology for Students with Autism

One of the most effective ways of socializing a person with special needs is education. Information technology, that are used for education of children with autism should be designed with taking into account the peculiar...

Method of three-dimensional reconstruction of surface after triad of images and his exactness estimation

The article considers the problem of error estimation for three-dimensional surface reconstruction from 2D images. It is proposed to use three directions of illumination of recording images. The error estimation of the r...

Formal model of knowledge processing in situational awareness systems

In the article are described models and methods of formalizing and modeling process of decision making in systems with situation awareness (SAW systems). The definitions of mathematical formalization of situation are dis...

Download PDF file
  • EP ID EP576484
  • DOI -
  • Views 153
  • Downloads 0

How To Cite

Oleg Kushnir, Mykola Alfavitskyi, Viktor Dzikovskyi, Lyubomyr Ivanitskyi, Sergiy Rykhlyuk, Volodymyr Sokulskyi (2017). Statistics of words occurrences in natural and random texts. Vìsnik Nacìonalʹnogo unìversitetu "Lʹvìvsʹka polìtehnìka". Serìâ Ìnformacìjnì sistemi ta merežì, 872(), 162-178. https://europub.co.uk/articles/-A-576484