A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition

Abstract

Protein fold recognition plays an important role in computational protein analysis since it can determine protein function whose structure is unknown. In this paper, a Classified Sequential Pattern mining technique for Protein Fold Recognition (CSPF) is proposed. CSPF technique consists of two main phases: the sequential mining pattern phase and the fold recognition phase. In the sequential mining pattern phase, Mix & Test algorithm is developed based on Grammatical Inference, which is used as a training phase. Mix & Test algorithm minimizes I/O costs by one database scan, discovers subsequence combinations directly from sequences in memory without searching the whole sequence file, has no database projection, handles gaps, and works with variant length sequences without having to align them. In addition, a parallelized version of Mix & Test algorithm is applied to speed up Mix & Test algorithm performance. In the fold recognition phase, unknown protein folds are predicted via a proposed testing function. To test the performance, 36 SCOP protein folds are used, where the accuracy rate is 75.84% for training data and 59.7% for testing data.

Authors and Affiliations

Taysir Soliman, Ahmed Eldin, Marwa Ghareeb, Mohammed Marie

Keywords

Related Articles

Effective Calibration and Evaluation of Multi-Camera Robotic Head

The paper deals with appropriate calibration of multispectral vision systems and evaluation of the calibration and data-fusion quality in real-world indoor and outdoor conditions. Checkerboard calibration pattern develop...

Investigation of Critical Factors that Perturb Business-IT Alignment in Organizations

Business executives around the globe have recognised the significance of information technology (IT) and started adopting IT in their business processes. Firms always invest in adopting latest technologies in order to co...

A Hybrid Approach for Measuring Semantic Similarity between Documents and its Application in Mining the Knowledge Repositories

This paper explains about similarity measure and the relationship between the knowledge repositories. This paper also describes the significance of document similarity measures, algorithms and to which type of text it ca...

Artificial Intelligence Chatbots are New Recruiters

The purpose of the paper is to assess the artificial intelligence chatbots influence on recruitment process. The authors explore how chatbots offered service delivery to attract and candidates engagement in the recruitme...

Sperm Motility Algorithm for Solving Fractional Programming Problems under Uncertainty

This paper investigated solving Fractional Programming Problems under Uncertainty (FPPU) using Sperm Motility Algorithm. Sperm Motility Algorithm (SMA) is a novel metaheuristic algorithm inspired by fertilization process...

Download PDF file
  • EP ID EP116496
  • DOI 10.14569/IJACSA.2014.051214
  • Views 84
  • Downloads 0

How To Cite

Taysir Soliman, Ahmed Eldin, Marwa Ghareeb, Mohammed Marie (2014). A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition. International Journal of Advanced Computer Science & Applications, 5(12), 97-106. https://europub.co.uk/articles/-A-116496