Efficient Processing of XML Documents in Hadoop Map Reduce

Journal Title: International Journal on Computer Science and Engineering - Year 2014, Vol 6, Issue 9

Abstract

XML has dominated the enterprise landscape for fifteen years and still remains the most commonly used data format. Despite its popularity the usage of XML for "Big Data" is challenging due to its semi-structured nature as well as rather demanding memory requirements and lack of support for some complex data structures such as maps. While a number of tools and technologies for processing XML are readily available the common approach for map-reduce environments is to create a "custom solution" that is based, for example, on Apache Hive User Defined Functions (UDF). As XML processing is the common use case, this paper describes a generic approach to handling XML based on Apache Hive architecture. The described functionality complements the existing family of Hive serializers/deserializers for other popular data formats, such as JSON, and makes it much easier for users to deal with the large amount of data in XML format.

Authors and Affiliations

Dmitry Vasilenko , Mahesh Kurapati

Keywords

Related Articles

The CAC Model and QoS Management in Wireless Multiservice Network

Abstract—Call admission control (CAC) is a key for ensuring the quality of service (QoS) in wireless multiservice network. With the advances in wireless communication technology and the growing interest in deploying mult...

Web Services Security Architectures Composition and Contract Design using RBAC

Service Oriented Architecture’s Web Services authorization traditionally is done using common access control models like Role-Based Access Control. In thinking of a composite application that stitches together the capabi...

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

Todays, there are many studies in complicated computation and big data processing by using the high performance computability of GPU. Tesla K20X recently announced by NVIDIA provides 3.95 TFLOPS in precision floating poi...

IP Address Blocking System

Hosting a site on the Internet makes it available everywhere. There are certain sites that are just meant for local use like local shopping marts that do not provide products for purchase in other countries. Also, there...

Provide a method to Prediction of nodes movement to optimize Routing Algorithms in Ad Hoc Networks

Ad hoc networks have been of great interest among scholars of the field, due to their flexibility, quick setup, high potentiality, and also their application in the battle field, fire, earthquakes, where there is no hope...

Download PDF file
  • EP ID EP94484
  • DOI -
  • Views 108
  • Downloads 0

How To Cite

Dmitry Vasilenko, Mahesh Kurapati (2014). Efficient Processing of XML Documents in Hadoop Map Reduce. International Journal on Computer Science and Engineering, 6(9), 329-333. https://europub.co.uk/articles/-A-94484