Efficient Processing of XML Documents in Hadoop Map Reduce - Europub

Search

Apply

Efficient Processing of XML Documents in Hadoop Map Reduce

Journal Title: International Journal on Computer Science and Engineering - Year 2014, Vol 6, Issue 9

Abstract

XML has dominated the enterprise landscape for fifteen years and still remains the most commonly used data format. Despite its popularity the usage of XML for "Big Data" is challenging due to its semi-structured nature as well as rather demanding memory requirements and lack of support for some complex data structures such as maps. While a number of tools and technologies for processing XML are readily available the common approach for map-reduce environments is to create a "custom solution" that is based, for example, on Apache Hive User Defined Functions (UDF). As XML processing is the common use case, this paper describes a generic approach to handling XML based on Apache Hive architecture. The described functionality complements the existing family of Hive serializers/deserializers for other popular data formats, such as JSON, and makes it much easier for users to deal with the large amount of data in XML format.

Authors and Affiliations

Dmitry Vasilenko , Mahesh Kurapati

Keywords

XML Apache Hadoop Apache Hive Map-Reduce VTD-XML XPath

Related Articles

Automatic Counting Cancer Cell Colonies using GIEA for TSK-type Neural Fuzzy Network

This paper proposes a TSK-type neural fuzzy network (TNFN) with a group interaction-based evolutionary algorithm (GIEA) for constructing the cancer cell colonies diagnosis system (CCCDS). The proposed GIEA is designed on...

Implementation of Secured password for Web applications using two server model

The secured password is the most commonly used uthentication mechanism in security applications [11]. There ay be chances of password hacking from the hackers, so hat t is very essential to protect password informati...

SERVICE ORIENTED APPLICATION IN AGENT BASED VIRTUAL KNOWLEDGE COMMUNITY

With the availability of the Internet, virtual communities are proliferating at an unprecedented rate. Indepth understanding of virtual community dynamics can help us to address critical organizational and information sy...

Extension of the Trusted Cloud Domain for the Composite Cloud Process

Internet is a "network of networks" that consists of millions of private and public, academic, business, and government networks of local to global scope that are linked by copper wires, fiber-optic cables, wireless conn...

Classification and Evaluation the Privacy Preserving Data Mining Techniques by using a Data Modification–based Framework

In recent years, the data mining techniques have met a serious challenge due to the increased concerning and worries of the privacy, that is, protecting the privacy of the critical and sensitive data. Different technique...

Download PDF file

EP ID EP94484
DOI -
Views 137
Downloads 0