Predicting Protein Localization Sites Using an Ensemble Self-Labeled Framework

Journal Title: Biomedical Journal of Scientific & Technical Research (BJSTR) - Year 2018, Vol 11, Issue 2

Abstract

In recent years machine learning has been thoroughly used in the bioinformatics and biomedical field. The prediction of cellular localization of the proteins can be considered very significant task in bioinformatics since wrong localization site can cause various diseases and infections to humans. Ensemble learning algorithms and semi-supervised algorithms have been independently developed to build efficient and robust classification models. In this paper we focus on the prediction of protein localization site in Escherichia Coli and Saccharomyces cerevisiae organisms utilizing a semi-supervised self-labeled algorithm based on ensemble methodologies. The experimental results showed the efficiency of our proposed algorithm compared against state-of-the-art self-labeled techniques. Proteins are important molecules in our cells made up of long sequences of amino acid residues [1]. Each protein within the body has a specific function, while they work normally when they are in the correct localization site. The function of a protein in general can be affected by its cellular localization (the location a protein has in a cell) and contributes to many diseases like cardiovascular, metabolic, neurodegenerative diseases and cancer [2]. Also, it is of high interest in various research areas, like therapeutic target discovery, drug design and biological research [3]. Therefore, the prediction of cellular localization of the proteins can be considered very helpful and is a significant task in bioinformatics which has been studied a lot [4-6]. In general, a prediction tool can take as input some attributes of a protein such as its protein sequence of amino acids and predict the location where this protein resides in a cell, such as the nucleus and Endoplasmic reticulum. X-ray crystallography, electron crystallography and nuclear magnetic resonance are some traditionally biochemical experimental methods adopted [7] for predicting protein cellular location. These methods are accurate and precise in general, but they are inefficient and unpractical because they are expensive and time consuming. Therefore, in the last two decades computational methods especially using machine learning methods have been developed to make predictions [5,8-17]. Escherichia Coli (E. coli) and Saccharomyces cerevisiae (Yeast) are two well characterized unicellular organisms which have been exhaustively studied [18]. These two organisms have different proteins allocated in their cell where they must be at their accurate positions. A wrong localization site of these proteins in the cell can cause various diseases and infections to humans such as bloody diarrhea [19]. In the past, there have been significant efforts for predicting the localization sites of proteins [18-28]. Anastasiadis and Magoulas [18] investigated the performance of K nearest neighbours, feed-forward neural networks with and without cross-validation and ensemble-based techniques for the prediction of protein localization sites in E. coli and Yeast. Their results showed that the ensemble-based techniques had the highest average classification accuracy per class, achieving 91.7% and 66.2% for E. coli and Yeast respectively. Chen [22], implemented three different machine learning techniques: Decision tree, perceptrons, two-layer feed-forward neural network for predicting proteins’ cellular localization on E. coli and Yeast datasets. From the results, a similar prediction accuracy was found for all three techniques and 65%~70% on E. coli dataset and 46%~50% on Yeast dataset. Sengur [23], investigated the performance of an artificial immune system based on fuzzy k-NN algorithm. The highest average classification accuracy was 97.29% for E. coli and 76.4% for Yeast. Bouziane et al. [21], utilized four supervised machine learning algorithms for the prediction of cellular localization sites of proteins. For their experiments, they used Naïve Bayesian, k-Nearest Neighbour and feed-forward neural network classifiers. The highest classification accuracy they managed to achieve was 95.8% for E. coli dataset and 73.4% for Yeast dataset. Very recently Priya and Chhabra [19], proposed a hybrid model of Support Vector Machine and the LogitBoost technique for the prediction of the protein localization site in E. coli bacteria. The maximum classification accuracy achieved was 95.23%. Motivated by previous work Satu et al. [20], utilized E. coli and Yeast datasets for the problem of protein localization prediction. For their experiments they used several data mining classification algorithms which were: lazy classifiers (kNN, KStar), meta classifiers (Iterative Classifies Optimizer, Logit boost, Random Committee, Rotation Forest), function classifiers (Logistics, Simple Logistics), tree classifier (LMT, Random Forest, Random Tree) and artificial neural networks, achieving 87.50% with Rotation Forest and 60.53% with Random Forest maximum classification accuracy for E. coli and Yeast respectively.

Authors and Affiliations

Emmanuel G Pintelas, Panagiotis Pintelas

Keywords

Related Articles

Perilunate Fracture-Dislocation; Clinical Image

Perilunate fracture dislocations are rare with seven percentage of wrist pathologies but many of these injuries are not diagnosed well so that roughly 25% of perilunate dislocations being missed in clinics (Figure 1). Th...

Some Comments on Pyrite’s Structure

Pyrite, a natural mineral with chemical formula of FeS2, is widely distributed in ores [1,2]. Relatively high concentration of impurity atoms in pyrite is obviously reflects its mineral typomorphism. One of the factors,...

Esophageal Choke and its Management in a Thorough Bred Horse

We describe the clinic case of a thoroughbred horse with esophageal obstruction that presented a fatal outcome, possibly attributed to inadequate management in which excessive use of the nasogastric tube to push the cont...

The Effectiveness of Life Skills Training on the Social Skills of Deaf Students

The aim of this research was to investigate the effectiveness of life skills training on the social skills of deaf students. The present research was used the experimental method with pretest, posttest design and a contr...

Ovarian Surgery and Ovarian Reserve: The Application of Temporary Compression for Natural Hemostasis to Eliminate Exposure of the Ovary to Chemical Agents and Physical Energies

Surgical interventions are primarily associated with the need for hemostasis. All types of energy (mechanical, electrical, thermal, welding, laser, etc.) as well as chemical adhesives and sealants that are used in surger...

Download PDF file
  • EP ID EP588432
  • DOI 10.26717/BJSTR.2018.11.002066
  • Views 191
  • Downloads 0

How To Cite

Emmanuel G Pintelas, Panagiotis Pintelas (2018). Predicting Protein Localization Sites Using an Ensemble Self-Labeled Framework. Biomedical Journal of Scientific & Technical Research (BJSTR), 11(2), 8364-8370. https://europub.co.uk/articles/-A-588432