THE EXTRACTION OF LEXICAL AND METRORHYTHMIC FEATURES WHICH ARE CHARACTERISTIC FOR THE GENRE AND THE STYLE AND FOR THEIR COMBINATIONS WITHIN THE PROCESS OF AUTOMATED PROCESSING OF TEXTS IN RUSSIAN
Journal Title: Современные информационные технологии и ИТ-образование - Year 2018, Vol 14, Issue 4
Abstract
This paper describes the algorithm of automatic extraction of the characteristic features for the genre and the style. This work was carried out in the framework of the development of a software system created in the Institute of Computational Technologies of SB RAS and designed for a complex analysis of metrorhythmic and genre-stylistic characteristics of poetic texts in Russian. The paper presents the structure of the software system developed in the ICT SB RAS and intended for a complex analysis of metrorhythmic and genre-stylistic characteristics of poetic texts in Russian. The system organically combines both original program modules which are created directly by the system developers and intended for the solution of the single-purpose tasks of the analysis of the poetic texts, and open access software products. The generalized approach, which allows to consider the poetic features in the form of a vector, on the one hand, allows to use the modern algorithms of the classification and their ensembles, on the other, such approach has the disadvantages for small volumes of the texts with which it is necessary to work. Therefore, the presence of such a step as verification allows the specialists to adjust the operation of the system based on an expert knowledge, and also makes the classification process transparent. As a tool, the Python libraries were used: scikit-learn, in which the algorithms of the classification and also the methods of their combination were implemented; and ELI5, which allows to establish a correspondence between the components of the feature vector with specific features. So, the extraction of lexical and metrorhythmic features which are characteristic for the genre and style and of their combinations improved the process of automated processing of poetic texts in Russian what is shown on the base of the corpus of poetic texts of A.S. Pushkin and K.N. Batyushkov. The obtained results can be used for the verification of the classifier and for a list of characteristic features for the genre and the style of a poet.
Authors and Affiliations
Vladimir Barakhnin, Olga Kozhemyakina, Elena Rychkova, Ilya Pastushkov, Yuliya Borzilova
WEB TECHNOLOGIES, ARTIFICIAL INTELLIGENCE AND COGNITIVE GOVERNMENT
It remains an open question whether the decentralized ecosystems distributed in the mobile environment of the fog computing system will eventually displace more centralized data development systems like Knowledge Vault a...
APPLIED PROBLEMS OF OPTIMIZING THE PROTECTION OF INTELLECTUAL PROPERTY IN THE EEU AND THE RUSSIAN FEDERATION
In the present article, the authors have carried out a comparative analysis of the lists of protected results of intellectual activity in the Russian Federation, in the countries of the EAEU, as well as in bilateral agre...
PROPAEDEUTICS PARALLEL COMPUTING IN SCHOOL INFORMATICS: THE TOPIC "SWARM OF ROBOTS" IN THE COMPETITION "TRIZFORMMASHKA-2017"
The paper describes the methodological materials for the inclusion of the topic "Parallel computing" in the school informatics. The set of tasks "Swarm of Robots" are considered. The tasks were tested at the competition...
DISTRIBUTION OF THE NEURAL NETWORK BETWEEN MOBILE DEVICE AND CLOUD INFRASTRUCTURE SERVICES
Neural networks become the only way to solve problems in some areas. Such tasks as recognition of images, sounds, classification require serious processor power and memory for training and functioning of the network. Mod...
METHODS OF CREATING DIGITAL TWINS BASED ON NEURAL NETWORK MODELING
It is assumed that by 2021, about half of the companies will use digital counterparts of different levels. The simplest digital twin models may not use machine learning, but the models using machine learning algorithms w...