Predicting Protein Localization Sites Using an Ensemble Self-Labeled Framework

Journal Title: Biomedical Journal of Scientific & Technical Research (BJSTR) - Year 2018, Vol 11, Issue 2

Abstract

In recent years machine learning has been thoroughly used in the bioinformatics and biomedical field. The prediction of cellular localization of the proteins can be considered very significant task in bioinformatics since wrong localization site can cause various diseases and infections to humans. Ensemble learning algorithms and semi-supervised algorithms have been independently developed to build efficient and robust classification models. In this paper we focus on the prediction of protein localization site in Escherichia Coli and Saccharomyces cerevisiae organisms utilizing a semi-supervised self-labeled algorithm based on ensemble methodologies. The experimental results showed the efficiency of our proposed algorithm compared against state-of-the-art self-labeled techniques. Proteins are important molecules in our cells made up of long sequences of amino acid residues [1]. Each protein within the body has a specific function, while they work normally when they are in the correct localization site. The function of a protein in general can be affected by its cellular localization (the location a protein has in a cell) and contributes to many diseases like cardiovascular, metabolic, neurodegenerative diseases and cancer [2]. Also, it is of high interest in various research areas, like therapeutic target discovery, drug design and biological research [3]. Therefore, the prediction of cellular localization of the proteins can be considered very helpful and is a significant task in bioinformatics which has been studied a lot [4-6]. In general, a prediction tool can take as input some attributes of a protein such as its protein sequence of amino acids and predict the location where this protein resides in a cell, such as the nucleus and Endoplasmic reticulum. X-ray crystallography, electron crystallography and nuclear magnetic resonance are some traditionally biochemical experimental methods adopted [7] for predicting protein cellular location. These methods are accurate and precise in general, but they are inefficient and unpractical because they are expensive and time consuming. Therefore, in the last two decades computational methods especially using machine learning methods have been developed to make predictions [5,8-17]. Escherichia Coli (E. coli) and Saccharomyces cerevisiae (Yeast) are two well characterized unicellular organisms which have been exhaustively studied [18]. These two organisms have different proteins allocated in their cell where they must be at their accurate positions. A wrong localization site of these proteins in the cell can cause various diseases and infections to humans such as bloody diarrhea [19]. In the past, there have been significant efforts for predicting the localization sites of proteins [18-28]. Anastasiadis and Magoulas [18] investigated the performance of K nearest neighbours, feed-forward neural networks with and without cross-validation and ensemble-based techniques for the prediction of protein localization sites in E. coli and Yeast. Their results showed that the ensemble-based techniques had the highest average classification accuracy per class, achieving 91.7% and 66.2% for E. coli and Yeast respectively. Chen [22], implemented three different machine learning techniques: Decision tree, perceptrons, two-layer feed-forward neural network for predicting proteins’ cellular localization on E. coli and Yeast datasets. From the results, a similar prediction accuracy was found for all three techniques and 65%~70% on E. coli dataset and 46%~50% on Yeast dataset. Sengur [23], investigated the performance of an artificial immune system based on fuzzy k-NN algorithm. The highest average classification accuracy was 97.29% for E. coli and 76.4% for Yeast. Bouziane et al. [21], utilized four supervised machine learning algorithms for the prediction of cellular localization sites of proteins. For their experiments, they used Naïve Bayesian, k-Nearest Neighbour and feed-forward neural network classifiers. The highest classification accuracy they managed to achieve was 95.8% for E. coli dataset and 73.4% for Yeast dataset. Very recently Priya and Chhabra [19], proposed a hybrid model of Support Vector Machine and the LogitBoost technique for the prediction of the protein localization site in E. coli bacteria. The maximum classification accuracy achieved was 95.23%. Motivated by previous work Satu et al. [20], utilized E. coli and Yeast datasets for the problem of protein localization prediction. For their experiments they used several data mining classification algorithms which were: lazy classifiers (kNN, KStar), meta classifiers (Iterative Classifies Optimizer, Logit boost, Random Committee, Rotation Forest), function classifiers (Logistics, Simple Logistics), tree classifier (LMT, Random Forest, Random Tree) and artificial neural networks, achieving 87.50% with Rotation Forest and 60.53% with Random Forest maximum classification accuracy for E. coli and Yeast respectively.

Authors and Affiliations

Emmanuel G Pintelas, Panagiotis Pintelas

Keywords

Related Articles

Cell Renewal and Regeneration

Cell or tissue renewal and regeneration are the two main developmental requirements of adult organisms. Both processes have as starting point a population of stem cells, normally located in a specific en...

On Optimal Control Pair Treatment: Clinical Management of Viremia Levels In Pathogenic-Induced HIV-1 Infections

The quest to actively draw the attention of research scientist to alternative approach for the eradication of the menace of HIV and its associated pathogens, informed the decision of this present work. In this paper, we...

Extraction Optimization for Phenols and Flavonoids from Cultured Mycelia of Cordyceps Ophioglossoides and Exploration of Bioactivities of its Aqueous and Ethanol Extracts

Aqueous and ethanol extracts from the mycelia of Cordyceps Ophioglossoides have been used as a nutritional supplement, especially for women suffering from massive postpartum vaginal bleeding in Southwest China. However,...

A Comparative Cross Sectional Study on the Awareness and Attitude towards Rubella Vaccine among the Medical and Non-Medical Students of Trichy District, Tamil Nadu

Introduction: Rubella virus when contracted by pregnant women, causes serious complications including rubella syndrome (CRS). Sensorineural hearing loss is one of the most common complication associated with CRS. In deve...

The Imiquimod Induced Psoriatic Animal “Model: Scientific Implications

Psoriasis is a chronic inflammatory auto-immune disease, which causes serious skin lesions, acanthosis and parakeratosis, leaving a long-lasting detrimental influence on our appearances and life quality. Though a huge sc...

Download PDF file
  • EP ID EP588432
  • DOI 10.26717/BJSTR.2018.11.002066
  • Views 188
  • Downloads 0

How To Cite

Emmanuel G Pintelas, Panagiotis Pintelas (2018). Predicting Protein Localization Sites Using an Ensemble Self-Labeled Framework. Biomedical Journal of Scientific & Technical Research (BJSTR), 11(2), 8364-8370. https://europub.co.uk/articles/-A-588432