PaMSA: A Parallel Algorithm for the Global Alignment of Multiple Protein Sequences

Apply

PaMSA: A Parallel Algorithm for the Global Alignment of Multiple Protein Sequences

Journal Title: International Journal of Advanced Computer Science & Applications - Year 2017, Vol 8, Issue 4

Abstract

Multiple sequence alignment (MSA) is a well-known problem in bioinformatics whose main goal is the identification of evolutionary, structural or functional similarities in a set of three or more related genes or proteins. We present a parallel approach for the global alignment of multiple protein sequences that combines dynamic programming, heuristics, and parallel programming techniques in an iterative process. In the proposed algorithm, the longest common subsequence technique is used to generate a first MSA by aligning identical residues. An iterative process improves the MSA by applying a number of operators that were defined in the present work, in order to produce more accurate alignments. The accuracy of the alignment was evaluated through the application of optimization functions. In the proposed algorithm, a number of processes work independently at the same time searching for the best MSA of a set of sequences. There exists a process that acts as a coordinator, whereas the rest of the processes are considered slave processes. The resulting algorithm was called PaMSA, which stands for Parallel MSA. The MSA accuracy and response time of PaMSA were compared against those of Clustal W, T-Coffee, MUSCLE, and Parallel T-Coffee on 40 datasets of protein sequences. When run as a sequential application, PaMSA turned out to be the second fastest when compared against the nonparallel MSA methods tested (Clustal W, T-Coffee, and MUSCLE). However, PaMSA was designed to be executed in parallel. When run as a parallel application, PaMSA presented better response times than Parallel T-Cofffee under the conditions tested. Furthermore, the sum-of-pairs scores achieved by PaMSA when aligning groups of sequences with an identity percentage score from approximately 70% to 100%, were the highest in all cases. PaMSA was implemented on a cluster platform using the C++ language through the application of the standard Message Passing Interface (MPI) library.

Authors and Affiliations

Irma R. Andalon-Garcia, Arturo Chavoya

Keywords

Multiple Sequence Alignment; parallel program-ming; Message Passing Interface

EP ID EP258409
DOI 10.14569/IJACSA.2017.080468
Views 105
Downloads 0