Improving the Performance of K-Means Clustering For High Dimensional Data Set
Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 6
Abstract
Clustering high dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Multiple dimensions are hard to think in, impossible to visualize, and, due to the exponential growth of the number of possible values with each dimension, impossible to enumerate. Hence to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed by efficient dimensionality reduction methods such as Principal Component Analysis (PCA).Cluster analysis in high-dimensional data as the process of fast identification and efficient description of clusters. The clusters have to be of high quality with regard to a suitably chosen homogeneity measure. K-means is a well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroids. There is a difficulty in comparing quality of the clusters produced Different initial partitions can result in different final clusters. Hence in this paper we proposed to use the Principal component Analysis method to reduce the data set from high dimensional to low dimensional. The new method is used to find the initial centroids to make the algorithm more effective and efficient. By comparing the result of original and proposed method, it was found that the results obtained from proposed method are more accurate.
Authors and Affiliations
P. Prabhu , N. Anbazhagan
A Simple Algorithm for Detection and Removal of Wormhole Attacks for Secure Routing In Ad Hoc Wireless Networks
The infrastructure of a Mobile Ad hoc Network (MANET) has no routers for routing, and all nodes must share the same routing protocol to assist each other when transmitting messages. However, almost all common routing pro...
Implementation of an Energy Efficient Reconfigurable uthentication Unit for Software Radio
To promote the commercial implementation of software ownload for Software Defined Radio (SDR) terminals, a secure method of download is vital. Downloading of all the relevant software is performed via a public channel,...
Extraction of Radiology Reports using Text mining
In this paper, we propose a text mining system to extract and use the information in radiology reports. The system consists of three main modules: medical finding extractor, report and image retriever. The medical findin...
A SURVEY:”MALNUTRITION FOR WOMEN”
The term malnutrition generally refers both to under nutrition and over nutrition Many factors can cause malnutrition, most of which relate to poor diet or severe and repeated infections, particularly in underprivileged...
A Single Fromat for Measuring different Aspects of Testing
In-Process testing metrics has been used from some years and its usage is frequently increasing. There are different metrics for software testing i.e to measure testing progress, Mean time between arrival of error, densi...