Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework

Abstract

Big data are giving new research challenges in the life sciences domain because of their variety, volume, veracity, velocity, and value. Predicting gene biomarkers is one of the vital research issues in bioinformatics field, where microarray gene expression and network based methods can be used. These datasets suffer from the huge data voluminous, causing main memory problems. In this paper, a Random Committee Node Classifier algorithm (RCNC) is proposed for identifying cancer biomarkers, which is based on microarray gene expression data and Protein-Protein Interaction (PPI) data. Data are enriched from other public databases, such as IntACT1 and UniProt2 and Gene Ontology3 (GO). Cancer Biomarkers are identified when applied to different datasets with an accuracy rate an accuracy rate 99.16%, 99.96% precision, 99.24% recall, 99.16% F1-measure and 99.6 ROC. To speed up the performance, it is run within a MapReduce framework, where RCNC MapReduce algorithm is much faster than RCNC sequential algorithm when having large datasets.

Authors and Affiliations

Taysir Soliman

Keywords

Related Articles

Recognizing Rainfall Pattern for Pakistan using Computational Intelligence

Over the world, rainfall patterns and seasons are shifting in new directions due to global warming. In the case of Pakistan, unusual rainfall events may outcome with droughts, floods and other natural disasters along wit...

Evaluating Cancer Treatment Alternatives using Fuzzy PROMETHEE Method

The aim of this study is to apply the principle of multi-criteria decision making theories on various types of cancer treatment techniques. Cancer is an abnormal cell that divides in an uncontrolled manner, it is a growt...

Secure and Privacy Preserving Mail Servers using Modified Homomorphic Encryption (MHE) Scheme

Electronic mail (Email) or the paperless mail is becoming the most acceptable, faster and cheapest way of formal and informal information sharing between users. Around 500 billion mails are sent each day and the count is...

Tsunami Warning System with Sea Surface Features Derived from Altimeter Onboard Satellites

A tsunami warning system based on active database system with satellite derived real-time data of tidal, significant wave height and ocean wind speed as well as assimilation data of sea level changes as one of the global...

Expected Reliability of Everyday- and Ambient Assisted Living Technologies

To receive valuable information about expected reliability in everyday technologies compared to Ambient Assisted Living (AAL) technologies, an online survey was conducted including five everyday (train, dishwasher, navig...

Download PDF file
  • EP ID EP101255
  • DOI 10.14569/IJACSA.2015.061225
  • Views 87
  • Downloads 0

How To Cite

Taysir Soliman (2015). Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework. International Journal of Advanced Computer Science & Applications, 6(12), 184-189. https://europub.co.uk/articles/-A-101255