P-Codec: Parallel Compressed File Decompression Algorithm for Hadoop

Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2016, Vol 15, Issue 8

Abstract

The increasing amount and size of data being handled by data analytic applications running on Hadoop has created a need for faster data processing. One of the effective methods for handling big data sizes is compression. Data compression not only makes network I/O processing faster, but also provides better utilization of resources. However, this approach defeats one of Hadoop’s main purposes, which is the parallelism of map and reduce tasks. The number of map tasks created is determined by the size of the file, so by compressing a large file, the number of mappers is reduced which in turn decreases parallelism. Consequently, standard Hadoop takes longer times to process. In this paper, we propose the design and implementation of a Parallel Compressed File Decompressor (P-Codec) that improves the performance of Hadoop when processing compressed data. P-Codec includes two modules; the first module decompresses data upon retrieval by a data node during the phase of uploading the data to the Hadoop Distributed File System (HDFS). This process reduces the runtime of a job by removing the burden of decompression during the MapReduce phase. The second P-Codec module is a decompressed map task divider that increases parallelism by dynamically changing the map task split sizes based on the size of the final decompressed block. Our experimental results using five different MapReduce benchmarks show an average improvement of approximately 80% compared to standard Hadoop.

Authors and Affiliations

Idris Hanafi, Amal Abdel-Raouf

Keywords

Related Articles

Improved PageRank Algorithm for Web Structure Mining

The growth of internet is increasing continuously by which the need for improving the quality of services has been increased. Web mining is a research area which applies data mining techniques to address all this need. W...

Biomedical Technology: The transforming paradigm of Healthcare Industry with its impact on Patient Monitoring System

Telehealth technology, which leverages wireless andenterprise networking, is changing the model of how we thinkabout going to the doctor. The ability to automatically andaccurately capture and record patient data from bi...

Developing and Evaluation of New Hybrid Encryption Algorithms

Wireless Sensor networks consist of hundreds or thousands of low cost, low power and self-organizing nodes which are highly distributed. As wireless sensor networks continue to grow, so does the need for effective securi...

SIMULATION Wi-Fi NETWORK WITH WIRELESS DISTRIBUTION SYSTEM (WDS) TOPOLOGY

At present, the need for internet access in public area using wireless connections is increasing drastically and this should be supported by good network infrastructure. Wireless network is one of the best alternative in...

AUTOMATIC ENERGY SAVING (AES) MODELTO BOOST UBIQUITOUS WIRELESS SENSOR NETWORKS (WSNs)

We deploy BT node (sensor) that offers passive and active sensing capability to save energy. BT node works in passive mode for outdoor communication and active for indoor communication. The BT node is supported with nove...

Download PDF file
  • EP ID EP650853
  • DOI 10.24297/ijct.v15i8.1500
  • Views 110
  • Downloads 0

How To Cite

Idris Hanafi, Amal Abdel-Raouf (2016). P-Codec: Parallel Compressed File Decompression Algorithm for Hadoop. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 15(8), 6991-6998. https://europub.co.uk/articles/-A-650853