Efficient Processing of XML Documents in Hadoop Map Reduce

Journal Title: International Journal on Computer Science and Engineering - Year 2014, Vol 6, Issue 9

Abstract

XML has dominated the enterprise landscape for fifteen years and still remains the most commonly used data format. Despite its popularity the usage of XML for "Big Data" is challenging due to its semi-structured nature as well as rather demanding memory requirements and lack of support for some complex data structures such as maps. While a number of tools and technologies for processing XML are readily available the common approach for map-reduce environments is to create a "custom solution" that is based, for example, on Apache Hive User Defined Functions (UDF). As XML processing is the common use case, this paper describes a generic approach to handling XML based on Apache Hive architecture. The described functionality complements the existing family of Hive serializers/deserializers for other popular data formats, such as JSON, and makes it much easier for users to deal with the large amount of data in XML format.

Authors and Affiliations

Dmitry Vasilenko , Mahesh Kurapati

Keywords

Related Articles

A Highly Effective and Efficient Route Discovery & Maintenance in DSR

Mobile Ad Hoc Network (MANET) is collection of multi-hop wireless mobile nodes that communicate with each other without centralized control or established infrastructure. The wireless links in this network are highly err...

ACO Based Feature Subset Selection for Multiple k-Nearest Neighbor Classifiers

The k-nearest neighbor (k-NN) is one of the most popular algorithms used for classification in various fields of pattern recognition & data mining problems. In k-nearest neighbor classification, the result of a new i...

Differential Evolution for Optimization of PID Gains in Automatic Generation Control

Automatic generation control (AGC) of a multi area power system provides power demand signals for AGC power generators to control frequency and tie-line power flow due to the large load changes or other disturbances. Occ...

ANN and Fuzzy Logic Models for the Prediction of groundwater level of a watershed

Computational Intelligence techniques have been proposed as an efficient tool for modeling and forecasting in recent years and in various applications. Groundwater is a highly valuable resource. Measurement and analysis...

REMOTE SENSING IMAGE COMPRESSION USING 3D-SPIHT ALGORITHM AND 3D-OWT

Remote Sensing is the gathering of information about a place from a distance. Such information can occur by sensors or satellite, without making any direct contact with that object. We present a new technique for the com...

Download PDF file
  • EP ID EP94484
  • DOI -
  • Views 128
  • Downloads 0

How To Cite

Dmitry Vasilenko, Mahesh Kurapati (2014). Efficient Processing of XML Documents in Hadoop Map Reduce. International Journal on Computer Science and Engineering, 6(9), 329-333. https://europub.co.uk/articles/-A-94484