PaMSA: A Parallel Algorithm for the Global Alignment of Multiple Protein Sequences

Abstract

Multiple sequence alignment (MSA) is a well-known problem in bioinformatics whose main goal is the identification of evolutionary, structural or functional similarities in a set of three or more related genes or proteins. We present a parallel approach for the global alignment of multiple protein sequences that combines dynamic programming, heuristics, and parallel programming techniques in an iterative process. In the proposed algorithm, the longest common subsequence technique is used to generate a first MSA by aligning identical residues. An iterative process improves the MSA by applying a number of operators that were defined in the present work, in order to produce more accurate alignments. The accuracy of the alignment was evaluated through the application of optimization functions. In the proposed algorithm, a number of processes work independently at the same time searching for the best MSA of a set of sequences. There exists a process that acts as a coordinator, whereas the rest of the processes are considered slave processes. The resulting algorithm was called PaMSA, which stands for Parallel MSA. The MSA accuracy and response time of PaMSA were compared against those of Clustal W, T-Coffee, MUSCLE, and Parallel T-Coffee on 40 datasets of protein sequences. When run as a sequential application, PaMSA turned out to be the second fastest when compared against the nonparallel MSA methods tested (Clustal W, T-Coffee, and MUSCLE). However, PaMSA was designed to be executed in parallel. When run as a parallel application, PaMSA presented better response times than Parallel T-Cofffee under the conditions tested. Furthermore, the sum-of-pairs scores achieved by PaMSA when aligning groups of sequences with an identity percentage score from approximately 70% to 100%, were the highest in all cases. PaMSA was implemented on a cluster platform using the C++ language through the application of the standard Message Passing Interface (MPI) library.

Authors and Affiliations

Irma R. Andalon-Garcia, Arturo Chavoya

Keywords

Related Articles

Using PCA and Factor Analysis for Dimensionality Reduction of Bio-informatics Data

Large volume of Genomics data is produced on daily basis due to the advancement in sequencing technology. This data is of no value if it is not properly analysed. Different kinds of analytics are required to extract usef...

Automation of Optimized Gabor Filter Parameter Selection for Road Cracks Detection

Automated systems for road crack detection are extremely important in road maintenance for vehicle safety and traveler’s comfort. Emerging cracks in roads need to be detected and accordingly repaired as early as possible...

Image Contrast Enhancement by Scaling Reconstructed Approximation Coefficients using SVD Combined Masking Technique

The proposed method addresses the general issues of image contrast enhancement. The input image is enhanced by incorporating discrete wavelet transform, singular value decomposition, standard intensity deviation based cl...

Handwriting Word Recognition Based on SVM Classifier

This paper proposed a new architecture for handwriting word recognition system Based on Support Vector Machine SVM Classifier. The proposed work depends on the handwriting word level, and it does not need for character s...

Competitive Sparse Representation Classification for Face Recognition

A method, named competitive sparse representation classification (CSRC), is proposed for face recognition in this paper. CSRC introduces a lowest competitive deletion mechanism which removes the lowest competitive sample...

Download PDF file
  • EP ID EP258409
  • DOI 10.14569/IJACSA.2017.080468
  • Views 88
  • Downloads 0

How To Cite

Irma R. Andalon-Garcia, Arturo Chavoya (2017). PaMSA: A Parallel Algorithm for the Global Alignment of Multiple Protein Sequences. International Journal of Advanced Computer Science & Applications, 8(4), 513-522. https://europub.co.uk/articles/-A-258409