Efficient Processing of XML Documents in Hadoop Map Reduce

Journal Title: International Journal on Computer Science and Engineering - Year 2014, Vol 6, Issue 9

Abstract

XML has dominated the enterprise landscape for fifteen years and still remains the most commonly used data format. Despite its popularity the usage of XML for "Big Data" is challenging due to its semi-structured nature as well as rather demanding memory requirements and lack of support for some complex data structures such as maps. While a number of tools and technologies for processing XML are readily available the common approach for map-reduce environments is to create a "custom solution" that is based, for example, on Apache Hive User Defined Functions (UDF). As XML processing is the common use case, this paper describes a generic approach to handling XML based on Apache Hive architecture. The described functionality complements the existing family of Hive serializers/deserializers for other popular data formats, such as JSON, and makes it much easier for users to deal with the large amount of data in XML format.

Authors and Affiliations

Dmitry Vasilenko , Mahesh Kurapati

Keywords

Related Articles

A cross layer Design to Enhance Throughput for Multimedia Streaming over Mobile Ad hoc Networks

The main objective of this paper is to propose a novel method for enhancing the Quality of Service (QoS) of multimedia applications in wireless adhoc networks. The enhancement is achieved by mplementing the Connectionle...

Experimenting with Request Assignment Simulator (RAS)

There is no existence of dedicated simulators on the Internet that studies the impact of load balancing principles of the cloud architectures. Request Assignment Simulator (RAS) is a customizable, visual tool that helps...

The State of Information and Communication Technology in Iran

Abstract—The development of information and communication technologies depend on several factors like government policy that encourage investors to spend their money in building IT infrastructure and force business or pa...

Rule Based Classification to Detect Malnutrition in Children

Data mining is an area which used in vast field of areas. Rule based classification is one of the sub areas in data mining. From this paper it will describe how rule based classification is used alone with Agent Technolo...

Invariant Moments based War Scene Classification using ANN and SVM: A Comparative Study

In this paper we are trying to classify a war scene from the natural scene. For this purpose two set of image categories are taken viz., opencountry & war tank. By using Invariant Moments features are extracted from...

Download PDF file
  • EP ID EP94484
  • DOI -
  • Views 123
  • Downloads 0

How To Cite

Dmitry Vasilenko, Mahesh Kurapati (2014). Efficient Processing of XML Documents in Hadoop Map Reduce. International Journal on Computer Science and Engineering, 6(9), 329-333. https://europub.co.uk/articles/-A-94484