Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework

Abstract

Big data are giving new research challenges in the life sciences domain because of their variety, volume, veracity, velocity, and value. Predicting gene biomarkers is one of the vital research issues in bioinformatics field, where microarray gene expression and network based methods can be used. These datasets suffer from the huge data voluminous, causing main memory problems. In this paper, a Random Committee Node Classifier algorithm (RCNC) is proposed for identifying cancer biomarkers, which is based on microarray gene expression data and Protein-Protein Interaction (PPI) data. Data are enriched from other public databases, such as IntACT1 and UniProt2 and Gene Ontology3 (GO). Cancer Biomarkers are identified when applied to different datasets with an accuracy rate an accuracy rate 99.16%, 99.96% precision, 99.24% recall, 99.16% F1-measure and 99.6 ROC. To speed up the performance, it is run within a MapReduce framework, where RCNC MapReduce algorithm is much faster than RCNC sequential algorithm when having large datasets.

Authors and Affiliations

Taysir Soliman

Keywords

Related Articles

An Improved Social Media Analysis on 3 Layers: A Real Time Enhanced Recommendation System

The Internet can be considered as an open field for expression regarding products, politics, ideas, and people. Those expressive interactions generate a large amount of data pinned per users and groups. In that scope, Bi...

Case-Based Reasoning for Selecting Study Program in Senior High School

One of the reasoning methods in expert system is Case-Based Reasoning (CBR). A problem is searching for past cases in the case base with thehighest similarity degree. This implies that calculation of similarity degree am...

Fuzzy C-Means based Inference Mechanism for Association Rule Mining: A Clinical Data Mining Approach

Association rule mining has wide variety of research in the field of data mining, many of association rule mining approaches are well investigated in literature, but the major issue with ARM is, huge number of frequent p...

A RDWT and Block-SVD based Dual Watermarking Scheme for Digital Images

In the modern era, digital image watermarking is a successful method to protect the multimedia digital data for example copyright protection, content verification, rightful ownership identification, tamper detection etc....

Evaluating the Effectiveness of Decision Support System: Findings and Comparison

Nowadays, regardless of the popularity and credibility of Decision Support Systems (DSS), measuring the efficacy of the decisions taken by the DSS is yet to be proven. As previous works identifies the complexities involv...

Download PDF file
  • EP ID EP101255
  • DOI 10.14569/IJACSA.2015.061225
  • Views 107
  • Downloads 0

How To Cite

Taysir Soliman (2015). Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework. International Journal of Advanced Computer Science & Applications, 6(12), 184-189. https://europub.co.uk/articles/-A-101255