Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2015, Vol 6, Issue 12
Abstract
Big data are giving new research challenges in the life sciences domain because of their variety, volume, veracity, velocity, and value. Predicting gene biomarkers is one of the vital research issues in bioinformatics field, where microarray gene expression and network based methods can be used. These datasets suffer from the huge data voluminous, causing main memory problems. In this paper, a Random Committee Node Classifier algorithm (RCNC) is proposed for identifying cancer biomarkers, which is based on microarray gene expression data and Protein-Protein Interaction (PPI) data. Data are enriched from other public databases, such as IntACT1 and UniProt2 and Gene Ontology3 (GO). Cancer Biomarkers are identified when applied to different datasets with an accuracy rate an accuracy rate 99.16%, 99.96% precision, 99.24% recall, 99.16% F1-measure and 99.6 ROC. To speed up the performance, it is run within a MapReduce framework, where RCNC MapReduce algorithm is much faster than RCNC sequential algorithm when having large datasets.
Authors and Affiliations
Taysir Soliman
Recognizing Rainfall Pattern for Pakistan using Computational Intelligence
Over the world, rainfall patterns and seasons are shifting in new directions due to global warming. In the case of Pakistan, unusual rainfall events may outcome with droughts, floods and other natural disasters along wit...
Evaluating Cancer Treatment Alternatives using Fuzzy PROMETHEE Method
The aim of this study is to apply the principle of multi-criteria decision making theories on various types of cancer treatment techniques. Cancer is an abnormal cell that divides in an uncontrolled manner, it is a growt...
Secure and Privacy Preserving Mail Servers using Modified Homomorphic Encryption (MHE) Scheme
Electronic mail (Email) or the paperless mail is becoming the most acceptable, faster and cheapest way of formal and informal information sharing between users. Around 500 billion mails are sent each day and the count is...
Tsunami Warning System with Sea Surface Features Derived from Altimeter Onboard Satellites
A tsunami warning system based on active database system with satellite derived real-time data of tidal, significant wave height and ocean wind speed as well as assimilation data of sea level changes as one of the global...
Expected Reliability of Everyday- and Ambient Assisted Living Technologies
To receive valuable information about expected reliability in everyday technologies compared to Ambient Assisted Living (AAL) technologies, an online survey was conducted including five everyday (train, dishwasher, navig...