THE PROBLEM OF HIGH DIMENSIONALITY WITH LOW DENSITY IN CLUSTERING
Journal Title: International Journal of Engineering, Science and Mathematics - Year 2012, Vol 2, Issue 2
Abstract
In many real-world applications, there are a number of dimensions having large variations in a dataset. The dimensions of the large variations scatter the cluster and confuse the distance between two samples in a dataset. This degrades the performances of many existing algorithms. This problem can be happened even when the number of dimensions of a dataset is small. Moreover, no existing method can distinguish whether the dataset has the highly repeated problem or low-density‟s problem. The only way to distinguish the problem is by a prior knowledge, which is given by the user. There are many methods to resolve this type of high dimensionality problem. The common way is to prune the non-significant features so that the features having large variations are removed and high-density cluster centers are obtained. Much research work has been carried out based on this criterion. The subspace clustering method is one of the well-known tools. The feature space is first partitioned into a number of equal length grids. Then, the density of each interval is measured. The features having low density are discarded and the clustering is conducted on the high density regions. Although these methods work very well on synthetic datasets, the pruned dimensions can carry useful information and hence, pruning them may increase the classification error rates.
Authors and Affiliations
Prof. T. Sudha and Swapna Sree Reddy. Obili
MODERN PRACTICES FOR EFFECTIVE SOFTWARE DEVELOPMENT PROCESS IN PROJECT MANAGEMENT
The effectiveness of software project management is dependent on multi-disciplinary, interrelated factors including the management of project scope, project time, project cost, project quality, project human resource,...
AGRICULTURAL PRODUCTIVITY IN KALAHANDI DISTRICT OF ORISSA OVER THE DECADES: A TEMPORAL ASSESSMENT
Agriculture supports a large segment of population by providing opportunities for employment and earning a livelihood. The undivided Kalahandi district of Orissa is one of the poorest regions of the state with about 70...
Identification of Paraphrasing in the context of Plagiarism
Paraphrasing is a very important form of processing for natural language processing (NLP). A characteristic property of natural language is that various expressions can exist to express a single concept. The aim of thi...
THE PROBLEM OF HIGH DIMENSIONALITY WITH LOW DENSITY IN CLUSTERING
In many real-world applications, there are a number of dimensions having large variations in a dataset. The dimensions of the large variations scatter the cluster and confuse the distance between two samples in a datas...
Fuzzy Controller for an Image based Traffic System.
Traffic problems nowadays are increasing because of the growing number of vehicles and the limited resources provided by current infrastructures. The simplest way for controlling a traffic light uses timer for each pha...