Implementation of Hadoop Based Framework for Parallel Processing of Biological Data

Journal Title: International Journal of Science and Research (IJSR) - Year 2015, Vol 4, Issue 4

Abstract

Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale datafrom high-throughput sequencing. Hadoop is designed to process large data sets (petabytes). It becomes a bottleneck, when handling massive small files because the name node utilize more memory to store the metadata of files and the data nodes consumes more CPU time to process massive small files. The open source Apache Hadoop project, which in this paper, presenting the Optimized Hadoop, consists of Merge Model to merge massive small files into a single large file and introduced the efficient indexing mechanism and adopts the MapReduce frame-work using decision classification rule for analysis and Diagnosis of Iris Plants data through a distributed file system to achieve scalable, efficient and reliable computing performance on Linux clusters of low cost commodity machines. Our experimental result shows that Optimized Hadoop improves performance of processing small files drastically up to 90.83% and effectively reduces the memory utilization of the name node to store the metadata of files.

Authors and Affiliations

Keywords

Related Articles

A Review on Work Related Musculoskeletal Disorders of the Workers Working in Different Workstations

Work is the livelihood where people get their earnings. Workers place is the environment in which he/she involves in work for longer time. Workstation is a place where work is carried out any time with the major applianc...

Synergistic Effects of AMF and Bacilluslehensis Strain MLB2 on Ocimum sanctum Grown under Fluoride Stress

The aim of the present study was to notice the synergistic effect of both, Arbuscular Mycorrhizal fungi (AMF) and Bacillus lehensis, on the growth of Ocimum sanctum var. CIM-AYU grown under 40 ppm of Sodium fluoride stre...

Study of Simulation of a Water Sensor Steady Applied for Membrane Distillation

"The sun provides the earth with huge amounts of energy that can be exploited in various ways. In this work, the use of solar energy by heat using a solar water plan has been studied. Before designing such a device, it i...

Identification and Location of Faults in Three Phase Underground Power Cables by using Mexican Hat and Coif Let Wavelet Transform

" Abstract: Estimation and d termination of fault in an underground cable is very important inorder to clear the fault quickly and to restore the supply with minimum interruption. This paper presents determination and lo...

Guidance and Counseling Services in Schools of Bangladesh: An Exploratory Study

The study aims to give an overview of available guidance and counseling services in schools of Bangladesh. The study was qualitative in nature. As the underlying motivation of the study was to gain insights about the nat...

Download PDF file
  • EP ID EP364166
  • DOI -
  • Views 115
  • Downloads 0

How To Cite

(2015). Implementation of Hadoop Based Framework for Parallel Processing of Biological Data. International Journal of Science and Research (IJSR), 4(4), -. https://europub.co.uk/articles/-A-364166