Reduction of Data at Namenode in HDFS using harballing Technique  

Abstract

HDFS stands for the Hadoop Distributed File System. It has the property of handling large size files (in MB’s, GB’s or TB’s). Scientific applications adapted this HDFS/Mapreduce for large scale data analytics [1]. But major problem is small size files which are common in these applications. HDFS manages these entire small file through single Namenode server [1]-[4]. Storing and processing these small size file in HDFS is overhead to mapreduce program and also have an impact on the performance on Namenode [1]-[3]. In this paper we studied the hadoop archiving technique which will reduce the storage overhead of data on Namenode and also helps in increasing the performance by reducing the map operations in the mapreudce program. Hadoop introduces “harballing” archiving technique which will collect large number of small files in single large file. Hadoop Archive (HAR) is an effective solution to the problem of many small files. HAR packs a number of small files into large files so that the original files can be accessed in parallel transparently (without expanding the files) and efficiently. Hadoop creates the archive file by using “.har” extension. HAR increases the scalability of the system by reducing the namespace usage and decreasing the operation load in the NameNode. This improvement is orthogonal to memory optimization in NameNode and distributing namespace management across multiple NameNodes [3].  

Authors and Affiliations

Vaibhav G. Korat , Kumar Swamy Pamu

Keywords

Related Articles

Improvising the Infrastructure as a Service Cloud  

The main benefit of IaaS (Infrastructure-as-a-Service clouds) is facilitate the users to retrieve their resources on demand. But, to afford on-demand access, cloud centers must either drastically abundance their...

SEARCHING TECHNIQUES IN ENCRYPTED CLOUD DATA  

Cloud computing can be defined as a new style of computing in which the resources are provided online through the internet. It provides storage as well as service. It uses the technique of virtualization. Virtualiz...

A New Way to Implement Stegnography by Minimizing Distortion  

In this paper we are going to learn about the minimization of distortion in steganography. For this purpose we use a general nonbinary embedding operation and discuss various system requirements. We assume every possible...

RULE-BASE DATA MINING SYSTEMS FOR CUSTOMER QUERIES  

The main objective of this paper is to have a best association between customer and organization. This project is proposed in order to discover knowledge from huge amount of data and to use the data efficiently b...

High Density Salt and Pepper Noise Removal in color and grayscale images Through Modified DBUTMF  

In most of the image processing applications, the image denoising is one of the main topic. Corrupted image is called the noisy image, and the corrected is called the de-noised image. We have different types of n...

Download PDF file
  • EP ID EP136150
  • DOI -
  • Views 126
  • Downloads 0

How To Cite

Vaibhav G. Korat, Kumar Swamy Pamu (2012). Reduction of Data at Namenode in HDFS using harballing Technique  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(4), 635-642. https://europub.co.uk/articles/-A-136150