ortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets

Journal Title: International Journal of Modern Engineering Research (IJMER) - Year 2013, Vol 3, Issue 4

Abstract

 : Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized layout (e.g. point-dimension, observation-variable, instance-feature), which is the standard layout required by most data mining algorithms. We propose three fundamental methods to evaluate horizontal aggregations: CASE: Exploiting the programming CASE construct; SPJ: Based on standard relational algebra operators (SPJ queries); PIVOT: Using thePIVOT operator, which is offered by some DBMSs. Experiments with large tables compare the proposed query evaluation methods. Our CASE method has similar speed to the PIVOT operator and it is much faster than the SPJ method. In general, the CASE and PIVOT methods exhibit linear scalability, where as the SPJ method does not.

Authors and Affiliations

B Susrutha, J. Vamsi Nath

Keywords

Related Articles

Seismic Vulnerability of RC Building With and Without Soft Storey Effect Using Pushover Analysis

A soft storey is one which has less resistance to earthquake forces than the other storeys; Buildings containing soft stories are extremely vulnerable to earthquake collapses, since one floor is flexible compared to othe...

Mechanical Properties Of Sisal And Pineapple Fiber Hybrid Composites Reinforced With Epoxy Resin

In this study, Work has been carried out to investigated tensile , bending and impact properties of hybrid composite of material constitutes sisal fiber and less discovered pineapple fiber. These composites are...

 Experimental Investigation of Performance Parameters of Single Cylinder IC Engine Using Mustard Oil

 Abstract: The conversion of biomass to energy (also called bio energy) encompasses a wide range of different types and sources of biomass, conversion options, end user applicationsand infrastructure requirements....

 Study of Roller Conveyor Chain Strip under Tensile Loading

 Conveyor chain drives are one of the primary systems used in industry to transmit power and convey products. Conveyor chain that suffers from premature elongation due to wear and needs to be replaced on a frequ...

 Modeling and Reduction of Root Fillet Stress in Spur Gear Using Stress Relieving Feature

 A gear is a component within a transmission device that transmits rotational forces. Gears are commonly used for transmitting power. Gear teeth failure due to fatigue is a common fact observed. Even a small red...

Download PDF file
  • EP ID EP141140
  • DOI -
  • Views 101
  • Downloads 0

How To Cite

B Susrutha, J. Vamsi Nath (2013).  ortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets. International Journal of Modern Engineering Research (IJMER), 3(4), 1861-1871. https://europub.co.uk/articles/-A-141140