Improving the Performance of K-Means Clustering For High Dimensional Data Set
Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 6
Abstract
Clustering high dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Multiple dimensions are hard to think in, impossible to visualize, and, due to the exponential growth of the number of possible values with each dimension, impossible to enumerate. Hence to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed by efficient dimensionality reduction methods such as Principal Component Analysis (PCA).Cluster analysis in high-dimensional data as the process of fast identification and efficient description of clusters. The clusters have to be of high quality with regard to a suitably chosen homogeneity measure. K-means is a well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroids. There is a difficulty in comparing quality of the clusters produced Different initial partitions can result in different final clusters. Hence in this paper we proposed to use the Principal component Analysis method to reduce the data set from high dimensional to low dimensional. The new method is used to find the initial centroids to make the algorithm more effective and efficient. By comparing the result of original and proposed method, it was found that the results obtained from proposed method are more accurate.
Authors and Affiliations
P. Prabhu , N. Anbazhagan
IMAGE SEGMENTATION BY USING EDGE DETECTION
In this paper, we present methods for edge segmentation of atellite images; we used seven techniques for this category; obel operator technique, Prewitt technique, Kiresh technique, Laplacian technique, Canny technique...
Comparative Study of Three Declarative Knowledge Representation Techniques
In artificial intelligence to solve the problem user require a knowledge base, consist all information related to problem domain and a method for manipulating the knowledge for finding the solution. For better result kno...
DYNAMIC WAVELENGTH ALLOCATION IN WDM OPTICAL NETWORKS
This paper investigates the problem of dynamic wave length allocation and fairness control in WDM optical networks. A f network topology, wih a two-hop path network, is studied for three classes of traffic. Each class co...
A generic conceptual and UML model for the multi-echelon distribution supply chain
The Multi-Echelon Distribution Supply Chain (MEDSC) is a multifaceted structure, focusing on the integration of all factors that involved in the overall distribution process of finished products to the customers. The gro...
Overcoming Testing Challenges in Project Life Cycle using Risk Based Validation Approach
According to James Whittake, Microsoft Testing Expert and Author, “There are a number of trends that testers are going to have to grapple with. The first is that software is getting better. The result of this is that bug...