Improving the Performance of K-Means Clustering For High Dimensional Data Set
Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 6
Abstract
Clustering high dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Multiple dimensions are hard to think in, impossible to visualize, and, due to the exponential growth of the number of possible values with each dimension, impossible to enumerate. Hence to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed by efficient dimensionality reduction methods such as Principal Component Analysis (PCA).Cluster analysis in high-dimensional data as the process of fast identification and efficient description of clusters. The clusters have to be of high quality with regard to a suitably chosen homogeneity measure. K-means is a well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroids. There is a difficulty in comparing quality of the clusters produced Different initial partitions can result in different final clusters. Hence in this paper we proposed to use the Principal component Analysis method to reduce the data set from high dimensional to low dimensional. The new method is used to find the initial centroids to make the algorithm more effective and efficient. By comparing the result of original and proposed method, it was found that the results obtained from proposed method are more accurate.
Authors and Affiliations
P. Prabhu , N. Anbazhagan
An Agent Based Simulation Model for Warning Messages Dissemination in a Vehicular Ad hoc Network
Since the safety on roads has become a main concern for both governments and car manufacturers in the last twenty years, number of applications into the domain of vehicular communication is proposed. Vehicular Ad hoc Net...
Biometric Template Security Using Invisible Watermarking With Minimum Degradation in Quality of Template
In this paper, we present an approach to enhance the Biometric Template Security by using Invisible Watermarking. For embedding the watermark in the Biometric Template, we used Parity Checker Method [2]. The use of Parit...
A Sequence Labeling Approach to Morphological Analyzer for Tamil Language
Morphological analysis is the basic process for any Natural Language Processing task. Morphology is the study of internal structure of the word. Morphological analysis retrieves the grammatical features and properties of...
Content Based Image Retrieval using Density Distribution and Mean of Binary Patterns of Walsh Transformed Color Images
This paper introduces a novel idea of Binary Pattern observation of column wise and Row wise Walsh transformed color images for feature vector generation. The density distribution of Sal, Cal components of Binary Pattern...
Conditional Random Fields based Pronominal Resolution in Tamil
This paper deals with Tamil pronominal resolution using Conditional Random Fields a machine learning approach. A detailed linguistic analysis of Tamil pronominals and its antecedence occurring in various syntactic constr...