A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition

Abstract

Protein fold recognition plays an important role in computational protein analysis since it can determine protein function whose structure is unknown. In this paper, a Classified Sequential Pattern mining technique for Protein Fold Recognition (CSPF) is proposed. CSPF technique consists of two main phases: the sequential mining pattern phase and the fold recognition phase. In the sequential mining pattern phase, Mix & Test algorithm is developed based on Grammatical Inference, which is used as a training phase. Mix & Test algorithm minimizes I/O costs by one database scan, discovers subsequence combinations directly from sequences in memory without searching the whole sequence file, has no database projection, handles gaps, and works with variant length sequences without having to align them. In addition, a parallelized version of Mix & Test algorithm is applied to speed up Mix & Test algorithm performance. In the fold recognition phase, unknown protein folds are predicted via a proposed testing function. To test the performance, 36 SCOP protein folds are used, where the accuracy rate is 75.84% for training data and 59.7% for testing data.

Authors and Affiliations

Taysir Soliman, Ahmed Eldin, Marwa Ghareeb, Mohammed Marie

Keywords

Related Articles

Towards Agile Implementation of Test Maturity Model Integration (TMMI) Level 2 using Scrum Practices

the software industry has invested the substantial effort to improve the quality of its products like ISO, CMMI and TMMI. Although applying of TMMI maturity criteria has a positive impact on product quality, test enginee...

Web Service Testing Techniques: A Systematic Literature Review

These days continual demands on loosely coupled systems have web service gives basic necessities to deliver resolution that are adaptable and sufficient to be work at runtime for maintaining the high quality of the syste...

Semantic Conflicts Reconciliation as a Viable Solution for Semantic Heterogeneity Problems

Achieving semantic interoperability is a current challenge in the field of data integration in order to bridge semantic conflicts occurring when the participating sources and receivers use different or implicit data assu...

A Systematic Literature Review to Determine the Web Accessibility Issues in Saudi Arabian University and Government Websites for Disable People

Kingdom of Saudi Arabia has shown great commitment and support in past 10 years towards the higher education and transformation of manual governmental services to online through web. As a result number of university and...

An Approach for Energy Efficient Dynamic Virtual Machine Consolidation in Cloud Environment

Nowadays, as the use of cloud computing service becomes more extensive and the customers welcome this service, an increasing trend in energy consumption and operational costs of these centers may be seen. To reduce opera...

Download PDF file
  • EP ID EP116496
  • DOI 10.14569/IJACSA.2014.051214
  • Views 89
  • Downloads 0

How To Cite

Taysir Soliman, Ahmed Eldin, Marwa Ghareeb, Mohammed Marie (2014). A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition. International Journal of Advanced Computer Science & Applications, 5(12), 97-106. https://europub.co.uk/articles/-A-116496