Reduction of Data at Namenode in HDFS using harballing Technique  

Abstract

HDFS stands for the Hadoop Distributed File System. It has the property of handling large size files (in MB’s, GB’s or TB’s). Scientific applications adapted this HDFS/Mapreduce for large scale data analytics [1]. But major problem is small size files which are common in these applications. HDFS manages these entire small file through single Namenode server [1]-[4]. Storing and processing these small size file in HDFS is overhead to mapreduce program and also have an impact on the performance on Namenode [1]-[3]. In this paper we studied the hadoop archiving technique which will reduce the storage overhead of data on Namenode and also helps in increasing the performance by reducing the map operations in the mapreudce program. Hadoop introduces “harballing” archiving technique which will collect large number of small files in single large file. Hadoop Archive (HAR) is an effective solution to the problem of many small files. HAR packs a number of small files into large files so that the original files can be accessed in parallel transparently (without expanding the files) and efficiently. Hadoop creates the archive file by using “.har” extension. HAR increases the scalability of the system by reducing the namespace usage and decreasing the operation load in the NameNode. This improvement is orthogonal to memory optimization in NameNode and distributing namespace management across multiple NameNodes [3].  

Authors and Affiliations

Vaibhav G. Korat , Kumar Swamy Pamu

Keywords

Related Articles

A Hybrid Local Broadcast Algorithm in Wireless Ad Hoc Networks using Dynamic Approach

Broadcasting is a commonly used feature in wireless ad hoc networks. It is a common operation for route establishment and for sending control and emergency messages. The primary goal of broadcasting is to successfully re...

Inventory Management System Software for Public Universities in Ghana (IMSSPUG)

Managing inventories at Public Universities is one of the major challenges for higher educational institutions in Ghana. This is especially true for large, diverse and research-oriented institutions like the University f...

Heterogeneous Interface Mobile Node in NS2  

The heterogeneous interface for the mobile node is the key feature for the next generation mobile world. It provides the flexibility for the mobile devices for moving devices to the next available best network for...

DETECTION OF HEALTH CARE USING DATAMINING CONCEPTS THROUGH WEB  

A major challenge facing healthcare organizations (hospitals, medical centers) is the provision of quality services at affordable costs. Quality service implies diagnosing patients correctly and administering treat...

Interpreting Inference Engine for Semantic Web 

Semantic web is a web of data, where data should be related to one another and also Knowledge will be organized in conceptual spaces according to its meaning. To understand and use the data and knowledge encoded in...

Download PDF file
  • EP ID EP136150
  • DOI -
  • Views 76
  • Downloads 0

How To Cite

Vaibhav G. Korat, Kumar Swamy Pamu (2012). Reduction of Data at Namenode in HDFS using harballing Technique  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(4), 635-642. https://europub.co.uk/articles/-A-136150