Reduction of Data at Namenode in HDFS using harballing Technique
Journal Title: International Journal of Advanced Research in Computer Engineering & Technology(IJARCET) - Year 2012, Vol 1, Issue 4
Abstract
HDFS stands for the Hadoop Distributed File System. It has the property of handling large size files (in MB’s, GB’s or TB’s). Scientific applications adapted this HDFS/Mapreduce for large scale data analytics [1]. But major problem is small size files which are common in these applications. HDFS manages these entire small file through single Namenode server [1]-[4]. Storing and processing these small size file in HDFS is overhead to mapreduce program and also have an impact on the performance on Namenode [1]-[3]. In this paper we studied the hadoop archiving technique which will reduce the storage overhead of data on Namenode and also helps in increasing the performance by reducing the map operations in the mapreudce program. Hadoop introduces “harballing” archiving technique which will collect large number of small files in single large file. Hadoop Archive (HAR) is an effective solution to the problem of many small files. HAR packs a number of small files into large files so that the original files can be accessed in parallel transparently (without expanding the files) and efficiently. Hadoop creates the archive file by using “.har” extension. HAR increases the scalability of the system by reducing the namespace usage and decreasing the operation load in the NameNode. This improvement is orthogonal to memory optimization in NameNode and distributing namespace management across multiple NameNodes [3].
Authors and Affiliations
Vaibhav G. Korat , Kumar Swamy Pamu
Improvising the Infrastructure as a Service Cloud
The main benefit of IaaS (Infrastructure-as-a-Service clouds) is facilitate the users to retrieve their resources on demand. But, to afford on-demand access, cloud centers must either drastically abundance their...
SEARCHING TECHNIQUES IN ENCRYPTED CLOUD DATA
Cloud computing can be defined as a new style of computing in which the resources are provided online through the internet. It provides storage as well as service. It uses the technique of virtualization. Virtualiz...
A New Way to Implement Stegnography by Minimizing Distortion
In this paper we are going to learn about the minimization of distortion in steganography. For this purpose we use a general nonbinary embedding operation and discuss various system requirements. We assume every possible...
RULE-BASE DATA MINING SYSTEMS FOR CUSTOMER QUERIES
The main objective of this paper is to have a best association between customer and organization. This project is proposed in order to discover knowledge from huge amount of data and to use the data efficiently b...
High Density Salt and Pepper Noise Removal in color and grayscale images Through Modified DBUTMF
In most of the image processing applications, the image denoising is one of the main topic. Corrupted image is called the noisy image, and the corrected is called the de-noised image. We have different types of n...