Efficient Processing of XML Documents in Hadoop Map Reduce
Journal Title: International Journal on Computer Science and Engineering - Year 2014, Vol 6, Issue 9
Abstract
XML has dominated the enterprise landscape for fifteen years and still remains the most commonly used data format. Despite its popularity the usage of XML for "Big Data" is challenging due to its semi-structured nature as well as rather demanding memory requirements and lack of support for some complex data structures such as maps. While a number of tools and technologies for processing XML are readily available the common approach for map-reduce environments is to create a "custom solution" that is based, for example, on Apache Hive User Defined Functions (UDF). As XML processing is the common use case, this paper describes a generic approach to handling XML based on Apache Hive architecture. The described functionality complements the existing family of Hive serializers/deserializers for other popular data formats, such as JSON, and makes it much easier for users to deal with the large amount of data in XML format.
Authors and Affiliations
Dmitry Vasilenko , Mahesh Kurapati
A Highly Effective and Efficient Route Discovery & Maintenance in DSR
Mobile Ad Hoc Network (MANET) is collection of multi-hop wireless mobile nodes that communicate with each other without centralized control or established infrastructure. The wireless links in this network are highly err...
ACO Based Feature Subset Selection for Multiple k-Nearest Neighbor Classifiers
The k-nearest neighbor (k-NN) is one of the most popular algorithms used for classification in various fields of pattern recognition & data mining problems. In k-nearest neighbor classification, the result of a new i...
Differential Evolution for Optimization of PID Gains in Automatic Generation Control
Automatic generation control (AGC) of a multi area power system provides power demand signals for AGC power generators to control frequency and tie-line power flow due to the large load changes or other disturbances. Occ...
ANN and Fuzzy Logic Models for the Prediction of groundwater level of a watershed
Computational Intelligence techniques have been proposed as an efficient tool for modeling and forecasting in recent years and in various applications. Groundwater is a highly valuable resource. Measurement and analysis...
REMOTE SENSING IMAGE COMPRESSION USING 3D-SPIHT ALGORITHM AND 3D-OWT
Remote Sensing is the gathering of information about a place from a distance. Such information can occur by sensors or satellite, without making any direct contact with that object. We present a new technique for the com...