Reduction of Data at Namenode in HDFS using harballing Technique  

Abstract

HDFS stands for the Hadoop Distributed File System. It has the property of handling large size files (in MB’s, GB’s or TB’s). Scientific applications adapted this HDFS/Mapreduce for large scale data analytics [1]. But major problem is small size files which are common in these applications. HDFS manages these entire small file through single Namenode server [1]-[4]. Storing and processing these small size file in HDFS is overhead to mapreduce program and also have an impact on the performance on Namenode [1]-[3]. In this paper we studied the hadoop archiving technique which will reduce the storage overhead of data on Namenode and also helps in increasing the performance by reducing the map operations in the mapreudce program. Hadoop introduces “harballing” archiving technique which will collect large number of small files in single large file. Hadoop Archive (HAR) is an effective solution to the problem of many small files. HAR packs a number of small files into large files so that the original files can be accessed in parallel transparently (without expanding the files) and efficiently. Hadoop creates the archive file by using “.har” extension. HAR increases the scalability of the system by reducing the namespace usage and decreasing the operation load in the NameNode. This improvement is orthogonal to memory optimization in NameNode and distributing namespace management across multiple NameNodes [3].  

Authors and Affiliations

Vaibhav G. Korat , Kumar Swamy Pamu

Keywords

Related Articles

AN OVERVIEW & ANALYSIS COMPARISION OF INTERNET PROTOCAL TCP\IP V/S OSI REFRENCE MODAL  

Basically network is a set number of interconnected lines presenting a net, and a network’s roads |an interlinked system, a network of alliances. Today, computer networks are the core of modern communication. A...

High Performance Computing: A Survey 

This paper surveys techniques used for high performance computing. High performance computing is used to develop machines which provide computing power like super computers. It concentrates on both software as well...

Efficient and Reliable Resource Management Framework for Public Cloud Computing

The problem of dynamic resource management for a large-scale cloud environment is mitigated with optimized high throughput performance. The resource management framework consists of, Gossip protocol that ensures fair res...

Survey on Data Sharing and Re-Encryption in Cloud 

Cloud storage helps enterprises and government agencies significantly reduce their financial overhead of data management, since they can now archive their data backups remotely to third-party cloud storage provider...

TOSCA ENABLING CLOUD PORTABILITY

The cloud is a computing architecture characterized by a large number of interconnected identical computing devices that can scale on demand and that communicate via an IP network. Many technologies commonly associated w...

Download PDF file
  • EP ID EP136150
  • DOI -
  • Views 113
  • Downloads 0

How To Cite

Vaibhav G. Korat, Kumar Swamy Pamu (2012). Reduction of Data at Namenode in HDFS using harballing Technique  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(4), 635-642. https://europub.co.uk/articles/-A-136150