Identifying Cancer Biomarkers Via Node Classification within a Mapreduce Framework
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2015, Vol 6, Issue 12
Abstract
Big data are giving new research challenges in the life sciences domain because of their variety, volume, veracity, velocity, and value. Predicting gene biomarkers is one of the vital research issues in bioinformatics field, where microarray gene expression and network based methods can be used. These datasets suffer from the huge data voluminous, causing main memory problems. In this paper, a Random Committee Node Classifier algorithm (RCNC) is proposed for identifying cancer biomarkers, which is based on microarray gene expression data and Protein-Protein Interaction (PPI) data. Data are enriched from other public databases, such as IntACT1 and UniProt2 and Gene Ontology3 (GO). Cancer Biomarkers are identified when applied to different datasets with an accuracy rate an accuracy rate 99.16%, 99.96% precision, 99.24% recall, 99.16% F1-measure and 99.6 ROC. To speed up the performance, it is run within a MapReduce framework, where RCNC MapReduce algorithm is much faster than RCNC sequential algorithm when having large datasets.
Authors and Affiliations
Taysir Soliman
Computer-based Approach to Detect Wrinkles and Suggest Facial Fillers
Modern medical practice has embraced facial filler injections as part of the innumerable cosmetic procedures that characterize the current age of medicine. This study proposed a novel methodological framework. The Incept...
AN AUTONOMIC AUTO-SCALING CONTROLLER FOR CLOUD BASED APPLICATIONS
One of the key promises of Cloud Computing is elasticity – applications have at their disposal a very large pool of resources from which they can allocate whatever they need. For any fair-size application the amount of r...
A Conceptual Framework for an Ontology-Based Examination System
There is an increasing reliance on the web for many software application deployments. Millions of services ranging from commerce, education, tourism and entertainment are now available on the web, making the web to be th...
Gene Optimized Deep Neural Round Robin Workflow Scheduling in Cloud
Workflow scheduling is a key problem to be solved in the cloud to increases the quality of services. Few research works have been designed for performing workflow scheduling using different techniques. But, scheduling pe...
Aggregation Operator for Assignment of Resources in Distributed Systems
In distributed processing systems it is often necessary to coordinate the allocation of shared resources that should be assigned to processes in the modality of mutual exclusion; in such cases, the order in which the sha...