Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework

Abstract

Big data are giving new research challenges in the life sciences domain because of their variety, volume, veracity, velocity, and value. Predicting gene biomarkers is one of the vital research issues in bioinformatics field, where microarray gene expression and network based methods can be used. These datasets suffer from the huge data voluminous, causing main memory problems. In this paper, a Random Committee Node Classifier algorithm (RCNC) is proposed for identifying cancer biomarkers, which is based on microarray gene expression data and Protein-Protein Interaction (PPI) data. Data are enriched from other public databases, such as IntACT1 and UniProt2 and Gene Ontology3 (GO). Cancer Biomarkers are identified when applied to different datasets with an accuracy rate an accuracy rate 99.16%, 99.96% precision, 99.24% recall, 99.16% F1-measure and 99.6 ROC. To speed up the performance, it is run within a MapReduce framework, where RCNC MapReduce algorithm is much faster than RCNC sequential algorithm when having large datasets.

Authors and Affiliations

Taysir Soliman

Keywords

Related Articles

E-Government Grid Services Topology Based On Province And Population In Indonesia

The e-Government Grid Service Model in Indonesia is an adjustments based on the framework of existing e-Government and also the form of government in the country. Grid-based services for interoperability could be a solut...

Helpful Statistics in Recognizing Basic Arabic Phonemes

The recognition of continuous speech is one of the main challenges in the building of automatic speech recognition (ASR) systems, especially when it comes to phonetically complex languages such as Arabic. An ASR system s...

Analysis of Software Deformity Prone Datasets with Use of AttributeSelectedClassifier

Software Deformity Prone datasets models are interesting research direction in the era of software world. In this research study, the interest class of software deformity prone is defective model datasets. There are diff...

Evaluating the Usability of Optimizing Text-based CAPTCHA Generation

A CAPTCHA is a test that can, automatically, tell human and computer programs apart. It is a mechanism widely used nowadays for protecting web applications, interfaces, and services from malicious users and automated spa...

FARM: Fuzzy Action Rule Mining

Action Mining is a sub-field of Data Mining that concerns about finding ready-to-apply action rules. The majority of the patterns discovered by traditional data mining methods require analysis and further work by domain...

Download PDF file
  • EP ID EP101255
  • DOI 10.14569/IJACSA.2015.061225
  • Views 114
  • Downloads 0

How To Cite

Taysir Soliman (2015). Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework. International Journal of Advanced Computer Science & Applications, 6(12), 184-189. https://europub.co.uk/articles/-A-101255