Speed-up Extension to Hadoop System

Apply

Speed-up Extension to Hadoop System

Journal Title: INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY - Year 2014, Vol 12, Issue 2

Abstract

For storage and analysis of online or streaming data which is too big in size most organization are moving toward using Apaches Hadoop- HDFS. Applications like log processors, search engines etc. using Hadoop Map Reduce for computing and HDFS for storage. Hadoop is most popular for analysis, storage and processing very large data but there need to be lots of changes in hadoop system. Here problem of data storage and data processing try to solve which helps hadoop system to improve processing speed and reduce time to execute the task. Hadoop application requires streaming access to data files. During placement of data files default placement of Hadoop does not consider any data characteristics. If the related set of files is stored in the same set of nodes, the efficiency and access latency can be increased. Hadoop uses Map Reduce framework for implementing large-scale distributed computing on unpredicted data sets. There are potential duplicate computations being performed in this process. No mechanism is to identify such duplicate computations which increase processing time. Solution for above problem is to co-locate related files by considering content and using locality sensitive hashing algorithm which is a clustering based algorithm will try to co -locate related file streams to the same set of nodes without affecting the default scalability and fault tolerance properties of Hadoop and for avoiding duplicate computation processing mechanism is developed which store executed task with result and before execution of any task stored executed tasks are compared if task find then direct result will be provided . By storing related files in same cluster which improve data locality mechanism and avoiding repeated execution of task improves processing time, both helps to speed up execution of Hadoop.

Authors and Affiliations

Sayali Ashok Shivarkar

Keywords

EP ID EP142234
DOI -
Views 107
Downloads 0