A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition

Abstract

Protein fold recognition plays an important role in computational protein analysis since it can determine protein function whose structure is unknown. In this paper, a Classified Sequential Pattern mining technique for Protein Fold Recognition (CSPF) is proposed. CSPF technique consists of two main phases: the sequential mining pattern phase and the fold recognition phase. In the sequential mining pattern phase, Mix & Test algorithm is developed based on Grammatical Inference, which is used as a training phase. Mix & Test algorithm minimizes I/O costs by one database scan, discovers subsequence combinations directly from sequences in memory without searching the whole sequence file, has no database projection, handles gaps, and works with variant length sequences without having to align them. In addition, a parallelized version of Mix & Test algorithm is applied to speed up Mix & Test algorithm performance. In the fold recognition phase, unknown protein folds are predicted via a proposed testing function. To test the performance, 36 SCOP protein folds are used, where the accuracy rate is 75.84% for training data and 59.7% for testing data.

Authors and Affiliations

Taysir Soliman, Ahmed Eldin, Marwa Ghareeb, Mohammed Marie

Keywords

Related Articles

A Hybrid Heuristic/Deterministic Dynamic Programing Technique for Fast Sequence Alignment

Dynamic programming seeks to solve complex problems by breaking them down into multiple smaller problems. The solutions of these smaller problems are then combined to reach the overall solution. Deterministic algorithms...

Swarm Optimization based Radio Resource Allocation for Dense Devices D2D Communication

In Device to Device (D2D) communication two or more devices communicate directly with each other in the in-band cellular network. It enhances the spectral efficiency due to cellular radio resources (RR) are shared among...

Internet of Things (IoT) : Charity Automation

People are living in cities and villages based on their profession and their earnings. Those who have better earnings can live their live nicely. However, those who do not have good earnings are facing difficulties to su...

A Qualitative Analysis to Evaluate Key Characteristics of Web Mining based e-Commerce Applications

E-Commerce applications are playing vital role by providing competitive advantage over business peers. It is important to get interesting patterns from e-commerce transactions to analyze customer experience, customer lik...

Review of Energy Reduction Techniques for Green Cloud Computing

The growth of cloud computing has led to uneconomical energy consumption in data processing, storage, and communications. This is unfriendly to the environment, because of the carbon emissions. Therefore, green IT is req...

Download PDF file
  • EP ID EP116496
  • DOI 10.14569/IJACSA.2014.051214
  • Views 72
  • Downloads 0

How To Cite

Taysir Soliman, Ahmed Eldin, Marwa Ghareeb, Mohammed Marie (2014). A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition. International Journal of Advanced Computer Science & Applications, 5(12), 97-106. https://europub.co.uk/articles/-A-116496