Reduction of Data at Namenode in HDFS using harballing Technique  

Abstract

HDFS stands for the Hadoop Distributed File System. It has the property of handling large size files (in MB’s, GB’s or TB’s). Scientific applications adapted this HDFS/Mapreduce for large scale data analytics [1]. But major problem is small size files which are common in these applications. HDFS manages these entire small file through single Namenode server [1]-[4]. Storing and processing these small size file in HDFS is overhead to mapreduce program and also have an impact on the performance on Namenode [1]-[3]. In this paper we studied the hadoop archiving technique which will reduce the storage overhead of data on Namenode and also helps in increasing the performance by reducing the map operations in the mapreudce program. Hadoop introduces “harballing” archiving technique which will collect large number of small files in single large file. Hadoop Archive (HAR) is an effective solution to the problem of many small files. HAR packs a number of small files into large files so that the original files can be accessed in parallel transparently (without expanding the files) and efficiently. Hadoop creates the archive file by using “.har” extension. HAR increases the scalability of the system by reducing the namespace usage and decreasing the operation load in the NameNode. This improvement is orthogonal to memory optimization in NameNode and distributing namespace management across multiple NameNodes [3].  

Authors and Affiliations

Vaibhav G. Korat , Kumar Swamy Pamu

Keywords

Related Articles

Improved Shortest Remaining Burst Round Robin (ISRBRR) Using RMS as its time quantum 

Round Robin (RR) performs optimally in timeshared systems because each process is given an equal amount of static time quantum. But the effectiveness of RR algorithm solely depends upon the choice of time quantum....

Study on Techniques to Improve Diversity in Recommender Systems  

Recommender systems play a significant role in E-Marketing. Many companies got increase in their sale due to establishing their products in the sites like Amazon.com, Netflix.com, and Movielens.com etc. These sites h...

Signal Delay Control Based on Different Switching Techniques in Optical Routed Interconnection Signal Delay Control Based on Different Switching Techniques in Optical Routed Interconnection Networks

This paper has investigated the different switching techniques to reduce signal latency for different optical interconnection network architectures that provide sufficient quality of service (QoS) and are suitable fo...

Face Recognition Using Principal Component Analysis Method 

This paper mainly addresses the building of face recognition system by using Principal Component Analysis (PCA). PCA is a statistical approach used for reducing the number of variables in face recognition. In PCA...

A SURVEY ON AD-HOC IN WIRELESS SENSOR NETWORKS 

- This paper says the performance of wireless ad hoc network. A wireless network has several types in it. For more expedient usage in the wireless trait ad hoc network is introduced. Ad-hoc network is a collection...

Download PDF file
  • EP ID EP136150
  • DOI -
  • Views 129
  • Downloads 0

How To Cite

Vaibhav G. Korat, Kumar Swamy Pamu (2012). Reduction of Data at Namenode in HDFS using harballing Technique  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(4), 635-642. https://europub.co.uk/articles/-A-136150