Improving the Performance of K-Means Clustering For High Dimensional Data Set
Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 6
Abstract
Clustering high dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Multiple dimensions are hard to think in, impossible to visualize, and, due to the exponential growth of the number of possible values with each dimension, impossible to enumerate. Hence to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed by efficient dimensionality reduction methods such as Principal Component Analysis (PCA).Cluster analysis in high-dimensional data as the process of fast identification and efficient description of clusters. The clusters have to be of high quality with regard to a suitably chosen homogeneity measure. K-means is a well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroids. There is a difficulty in comparing quality of the clusters produced Different initial partitions can result in different final clusters. Hence in this paper we proposed to use the Principal component Analysis method to reduce the data set from high dimensional to low dimensional. The new method is used to find the initial centroids to make the algorithm more effective and efficient. By comparing the result of original and proposed method, it was found that the results obtained from proposed method are more accurate.
Authors and Affiliations
P. Prabhu , N. Anbazhagan
Resilience Against Node Capture Attack using Asymmetric Matrices in Key Predistribution Scheme in Wireless Sensor Networks
Wireless Sensor Networks (WSN) usually consists of a large number of tiny sensors with limited computation capability, memory space and power resource. WSN’s are extremely vulnerable against any kind of internal or exter...
Nonlinear H∞ controller for flexible joint robots with using feedback linearization
This paper proposes a new approach to feedback linearization of flexible link robots which have uncertain modeling. The flexibility of joints is performed by use of the solenoid nonlinear springs, which have damper prope...
An Algorithmic Approach for Efficient Image Compression using Neuro-Wavelet Model and Fuzzy Vector Quantization Technique
Applications, which need to store large database and/or transmit digital images requiring high bit-rates over channels with limited bandwidth, have demanded improved image compression techniques. This paper describes pra...
Task Scheduling Algorithm to Reduce the Number of Processors using Merge Conditions
Some task scheduling algorithms generate the shortest schedule, when its input DAG satisfies a specified condition. Among those scheduling algorithms, TDS algorithm proposed a DAG condition where allocation of two parent...
A Systematic Approach for Constructing Static Class Diagrams from Software Requirements
The trend towards the use of object-oriented methods for software systems development has made it necessary for the use of object-oriented approaches in object-oriented software systems development. Class diagrams repres...