Efficient Processing of XML Documents in Hadoop Map Reduce

Journal Title: International Journal on Computer Science and Engineering - Year 2014, Vol 6, Issue 9

Abstract

XML has dominated the enterprise landscape for fifteen years and still remains the most commonly used data format. Despite its popularity the usage of XML for "Big Data" is challenging due to its semi-structured nature as well as rather demanding memory requirements and lack of support for some complex data structures such as maps. While a number of tools and technologies for processing XML are readily available the common approach for map-reduce environments is to create a "custom solution" that is based, for example, on Apache Hive User Defined Functions (UDF). As XML processing is the common use case, this paper describes a generic approach to handling XML based on Apache Hive architecture. The described functionality complements the existing family of Hive serializers/deserializers for other popular data formats, such as JSON, and makes it much easier for users to deal with the large amount of data in XML format.

Authors and Affiliations

Dmitry Vasilenko , Mahesh Kurapati

Keywords

Related Articles

Palmprint Recognition in Eigen-space

This paper proposes a novel technique for palmprint recognition in context to biometric identification of a person. Palmprints are images of the inner portion of a person’s palm and consist of a complex pattern of random...

Analyzing security of Authenticated Routing Protocol (ARAN)

Ad hoc network allow nodes to communicate beyond their irect wireless transmission range by introducing cooperation in mobile computer (nodes). Many proposed routing protocol or ad hoc network operate in an ad hoc fashio...

MVC Design Pattern for the multi framework distributed pplications using XML, spring and struts framework

The model view controller (MVC) is a fundamental design attern for the separation between user interface logic and business logic. Since applications are very large in size these days and the MVC design pattern can weak...

GA Based Test Case Generation Approach for Formation of Efficient Set of Dynamic

Automated test case generation is an efficient approach for software testing. Slicing of program provides ease to testability and enhances debugging capacity. To generate the dynamic slice, slicing criterion is required...

Wireless Control LEGO NXT robot using voice commands

This paper presents a wireless interface to control a LEGO NXT robot using voice commands through a computer. To perform speech recognition is used CSLU TOOLKIT with a corpus of Mexican Spanish voice, recognized commands...

Download PDF file
  • EP ID EP94484
  • DOI -
  • Views 133
  • Downloads 0

How To Cite

Dmitry Vasilenko, Mahesh Kurapati (2014). Efficient Processing of XML Documents in Hadoop Map Reduce. International Journal on Computer Science and Engineering, 6(9), 329-333. https://europub.co.uk/articles/-A-94484