ortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets

Journal Title: International Journal of Modern Engineering Research (IJMER) - Year 2013, Vol 3, Issue 4

Abstract

 : Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized layout (e.g. point-dimension, observation-variable, instance-feature), which is the standard layout required by most data mining algorithms. We propose three fundamental methods to evaluate horizontal aggregations: CASE: Exploiting the programming CASE construct; SPJ: Based on standard relational algebra operators (SPJ queries); PIVOT: Using thePIVOT operator, which is offered by some DBMSs. Experiments with large tables compare the proposed query evaluation methods. Our CASE method has similar speed to the PIVOT operator and it is much faster than the SPJ method. In general, the CASE and PIVOT methods exhibit linear scalability, where as the SPJ method does not.

Authors and Affiliations

B Susrutha, J. Vamsi Nath

Keywords

Related Articles

UPFC in order to Enhance the Power System Reliability

The maintenance and reliability of the power system has become a major aspect of study. The solution is the use of FACTS devices especially the use of UPFC. Unified Power Flow Controller (UPFC) is the most widely use...

 CNC PART PROGRAMMING AND COST ANALYSIS ON VERTICAL MACHINING CENTRE (VTC)

 In the present study in view of the latest development and revolutionary changes taking place in CNC field through the world, Mechanical elements have to be designed and manufactured to precision, which is perfectl...

 Implementation of High Throughput Radix-16 FFT Processor

 The extension of radix-4 algorithm to radix-16 to achieve the high throughput of 2.59 giga-samples/s for WPAN’s.We are also reformulating radix-16 algorithm to achieve low-complexity and low area cost and high perf...

Strengthening Of PCC Beams by Using Different Types of Wire Mesh Jacketing”

 This paper presents the effect of the use of different types of wire mesh jacketing to the PCC beams. The experimental work is mainly concerned with the study of flexural strength of concrete by different types o...

Finite Element Analysis of frame with square meshing & radial meshing in Soil Structure Interaction

Mostly structure is analyzed and designed assuming fixed support at the foundation level and hence effect of compressibility of soil under the foundation is ignored. The structure analyzed and designed in this...

Download PDF file
  • EP ID EP141140
  • DOI -
  • Views 98
  • Downloads 0

How To Cite

B Susrutha, J. Vamsi Nath (2013).  ortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets. International Journal of Modern Engineering Research (IJMER), 3(4), 1861-1871. https://europub.co.uk/articles/-A-141140