Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2015, Vol 6, Issue 12
Abstract
Big data are giving new research challenges in the life sciences domain because of their variety, volume, veracity, velocity, and value. Predicting gene biomarkers is one of the vital research issues in bioinformatics field, where microarray gene expression and network based methods can be used. These datasets suffer from the huge data voluminous, causing main memory problems. In this paper, a Random Committee Node Classifier algorithm (RCNC) is proposed for identifying cancer biomarkers, which is based on microarray gene expression data and Protein-Protein Interaction (PPI) data. Data are enriched from other public databases, such as IntACT1 and UniProt2 and Gene Ontology3 (GO). Cancer Biomarkers are identified when applied to different datasets with an accuracy rate an accuracy rate 99.16%, 99.96% precision, 99.24% recall, 99.16% F1-measure and 99.6 ROC. To speed up the performance, it is run within a MapReduce framework, where RCNC MapReduce algorithm is much faster than RCNC sequential algorithm when having large datasets.
Authors and Affiliations
Taysir Soliman
Self-Healing Hybrid Protection Architecture for Passive Optical Networks
Expanding size of passive optical networks (PONs) along with high availability expectation makes the reliability performance a crucial need. Most protection architectures utilize redundant network components to enhance n...
An Incremental Technique of Improving Translation
Statistical machine translation (SMT) refers to using probabilistic methods of learning translation process primarily from the parallel text. In SMT, the linguistic information such as morphology and syntax can be added...
Logarithmic Spiral-based Construction of RBF Classifiers
Clustering process is defined as grouping similar objects together into homogeneous groups or clusters. Objects that belong to one cluster should be very similar to each other, but objects in different clusters will be d...
Applying Machine Learning Techniques for Classifying Cyclin-Dependent Kinase Inhibitors
The importance of protein kinases made them a target for many drug design studies. They play an essential role in cell cycle development and many other biological processes. Kinases are divided into different subfamilies...
Smart Cities: A Survey on Security Concerns
A smart city is developed, deployed and maintained with the help of Internet of Things (IoT). The smart cities have become an emerging phenomena with rapid urban growth and boost in the field of information technology. H...