Implementation of Hadoop Based Framework for Parallel Processing of Biological Data

Journal Title: UNKNOWN - Year 2015, Vol 4, Issue 4

Abstract

Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale datafrom high-throughput sequencing. Hadoop is designed to process large data sets (petabytes). It becomes a bottleneck, when handling massive small files because the name node utilize more memory to store the metadata of files and the data nodes consumes more CPU time to process massive small files. The open source Apache Hadoop project, which in this paper, presenting the Optimized Hadoop, consists of Merge Model to merge massive small files into a single large file and introduced the efficient indexing mechanism and adopts the MapReduce frame-work using decision classification rule for analysis and Diagnosis of Iris Plants data through a distributed file system to achieve scalable, efficient and reliable computing performance on Linux clusters of low cost commodity machines. Our experimental result shows that Optimized Hadoop improves performance of processing small files drastically up to 90.83% and effectively reduces the memory utilization of the name node to store the metadata of files.

Authors and Affiliations

Keywords

Related Articles

Role of Web Based Promotional Tools in Educational Sector: An Indian Scenario

The Education Industry holds an important place in the world. The education market in India, which is presently worth around Rs 5.9 trillion (US$ 92.98 billion), is poised for some major growth in the years to come, as b...

Anthropometric Indicators of Obesity and Percent Body Fat – A Measure for Weight Management

Obesity, along with other unhealthy living habits, nowadays represents one of the greatest risk factors for various diseases. This study was designed to define the most suitable anthropometric technique among body mass i...

Applications of Soft Sets in BH-algebra

In this paper, the concept of soft set BH-algebra is introduced and in the meantime, some of their properties and structural characteristics are discussed and studied. The bi-intersection, extended intersection, restrict...

Transient Stability Analysis of Multi Machine System

Transient Stability Analysis of Multi Machine System

A Study to Determine the Prevalence of Postnatal Depression Among Primigravida Mothers in Krishna Hospital Karad

"Abstract: Postnatal period is the period when the women readjusting physiologically and psychologically to motherhood. Emotional responses may be just as intense and powerful for experienced as well as for new mothers....

Download PDF file
  • EP ID EP364166
  • DOI -
  • Views 120
  • Downloads 0

How To Cite

(2015). Implementation of Hadoop Based Framework for Parallel Processing of Biological Data. UNKNOWN, 4(4), -. https://europub.co.uk/articles/-A-364166