COMPARATIVE ANALYSIS OF RELATED SEQUENCES AND THEIR INCREMENTS ON THE BASIS OF DISCRIMINANT ANALYSIS

Abstract

The article is devoted to the study of the relationship between the lengths of orthologous proteins of four organisms, one of which is taken as the basic one ( more than 1200 proteins in total). The methods of multivariate statistical analysis are used, it is applied to pairs, triples and fours (strings) composed of lengths of orthologous proteins. The number of such lines is from 200 to 400. The analysis of pair correlations, orthogonal transformation and cluster analysis allowed us to distinguish two homogeneous clusters of four-lengths. At the same time, we studied the increments of the length of the orthologous protein relative to the basic organism. We showed that the lines form a non-uniform sample, and the increments form a homogeneous sample. Then the task was to expand the clusters with rows with incomplete data. It was shown that cluster analysis is not applicable for this task, so we used discriminant analysis with a training sample — clustering with complete data. A 100 percent separation of all incomplete rows by clusters was obtained; with the following description of the length dependences of clusters on the base. The adequacy of the resulting regression equations was tested. As a result of statistical analysis, the following conclusions were made. For a set of lengths of orthologous series, a generalizing factor was obtained, let's call it the size of an orthologic object from 4 lengths of orthologous proteins. For the given task such sizes of objects were obtained, and their average group values differ, they form two separate ranges of values, one for each group of the values obtained by other methods. For series of increments of the lengths of orthologous proteins from objects of four, an analysis performed by all methods showed homogeneity of the set. It was shown that the lengths of orthologous proteins have significant autocorrelation, as is the case with rows associated with the same basic series.

Authors and Affiliations

Svetlana Istomina

Keywords

Related Articles

SEMI-EMPIRICAL NEURAL NETWORK MODELS OF CONTROLLED DYNAMICAL SYSTEMS

A simulation approach is discussed for maneuverable aircraft motion as nonlinear controlled dynamical system under multiple and diverse uncertainties including knowledge imperfection concerning simulated plant and its en...

MATHEMATICAL AND SOFTWARE SOLUTIONS OF STOCHASTIC CELLULAR AUTOMATONS WITH MEMORY

In provided article a model of stochastic cellular automatons with memory is discussed. The analysis of the model shows that the dynamics of state changes in such systems is very complex and non-obvious and requires furt...

INFORMATION REVOLUTION: INSTRUMENTS OF ANALYSIS AND FORECASTING. Instruments of information revolution applied analysis and some results of their usage

The paper describes application of traditional analytical models of technology change - General Purpose Tecnhology, Saarbrücken model of technology transfer, S-shaped curve - towards current information revolution. Analy...

MODERN ELECTRONIC MEANS OF FOREIGN LANGUAGE EDUCATION

Modern electronic means of foreign language education on the example of English and different variants of their usage in a university teacher work for organizing effective interaction with students during their cooperati...

METHODS OF CREATING DIGITAL TWINS BASED ON NEURAL NETWORK MODELING

It is assumed that by 2021, about half of the companies will use digital counterparts of different levels. The simplest digital twin models may not use machine learning, but the models using machine learning algorithms w...

Download PDF file
  • EP ID EP521850
  • DOI 10.25559/SITITO.14.201803.672-678
  • Views 120
  • Downloads 0

How To Cite

Svetlana Istomina (2018). COMPARATIVE ANALYSIS OF RELATED SEQUENCES AND THEIR INCREMENTS ON THE BASIS OF DISCRIMINANT ANALYSIS. Современные информационные технологии и ИТ-образование, 14(3), 672-678. https://europub.co.uk/articles/-A-521850