COMPARATIVE ANALYSIS OF RELATED SEQUENCES AND THEIR INCREMENTS ON THE BASIS OF DISCRIMINANT ANALYSIS
Journal Title: Современные информационные технологии и ИТ-образование - Year 2018, Vol 14, Issue 3
Abstract
The article is devoted to the study of the relationship between the lengths of orthologous proteins of four organisms, one of which is taken as the basic one ( more than 1200 proteins in total). The methods of multivariate statistical analysis are used, it is applied to pairs, triples and fours (strings) composed of lengths of orthologous proteins. The number of such lines is from 200 to 400. The analysis of pair correlations, orthogonal transformation and cluster analysis allowed us to distinguish two homogeneous clusters of four-lengths. At the same time, we studied the increments of the length of the orthologous protein relative to the basic organism. We showed that the lines form a non-uniform sample, and the increments form a homogeneous sample. Then the task was to expand the clusters with rows with incomplete data. It was shown that cluster analysis is not applicable for this task, so we used discriminant analysis with a training sample — clustering with complete data. A 100 percent separation of all incomplete rows by clusters was obtained; with the following description of the length dependences of clusters on the base. The adequacy of the resulting regression equations was tested. As a result of statistical analysis, the following conclusions were made. For a set of lengths of orthologous series, a generalizing factor was obtained, let's call it the size of an orthologic object from 4 lengths of orthologous proteins. For the given task such sizes of objects were obtained, and their average group values differ, they form two separate ranges of values, one for each group of the values obtained by other methods. For series of increments of the lengths of orthologous proteins from objects of four, an analysis performed by all methods showed homogeneity of the set. It was shown that the lengths of orthologous proteins have significant autocorrelation, as is the case with rows associated with the same basic series.
Authors and Affiliations
Svetlana Istomina
DIGITAL ECONOMY: METHODOLOGICAL INSTRUMENTS OF CONFIGURATION, PLANNING, MAINTENANCE
The relevance of a problem to a present situation is explained by rapid development of digital economy. Means of science don't allow foretelling what implementation will the next doubling of computer capacities following...
CORPUS OF SIGNS IN WRITING AS A TOOL TO INVESTIGATE THE PECULIARITIES OF HOW SIGNS FORM UP (ON THE EXAMPLE OF THE RUSSIAN SIGN LANGUAGE)
We investigate the peculiarities of how gestures are formed in Sign Languages; deaf people use these gestures to communicate with each other. These peculiarities make it problematic to describe the Sign Languages linguis...
FROM COMPUTING INFORMATION SYSTEMS TO HUMAN-ORIENTED ONES
It is shown that classification of technological eras (industrial, postindustrial, information age and knowledge age) can be mapped to the corresponding stages of development in ICT and in the evolution of corporate info...
RESEARCH OF THE AGGREGATION TRAFFIC OF THE WIRELESS IOT DEVICES
Currently, there is a rapid growth of the Internet of things. In this regard, the problem of overloading existing data networks with the traffic of IoT devices becomes topical. To study this problem it was decided to use...
METHOD OF CHOOSING A CONFIGURATION OF A HIGH AVAILABILITY DISTRIBUTED CONTROL PLATFORM FOR TRANSPORT SOFTWARE-DEFINED NETWORKS
The method of choosing a configuration of a high availability distributed control platform (HA DCP) for transport software-defined networks is considered. To provide fault tolerance to single controller failures the foll...