K-Means Document Clustering using Vector Space Model
Journal Title: Bonfring International Journal of Data Mining - Year 2015, Vol 5, Issue 2
Abstract
Document Clustering is the collection of similar documents into classes and the similarity is some function on the document. Document Clustering need not require any separate training process and manual tagging group in advance. The documents used in the same clusters are more similar, while the documents used in different clusters are more dissimilar. It is one of the familiar technique used in data analysis and is used in many areas including data mining, statistics and image analysis. The traditional clustering approaches lose its algorithmic approach when handling high dimensional data. For this, a new K-Means Clustering technique is proposed in this work. Here Cosine Similarity of Vector Space Model is used as the centroid for clustering. Using this approach, the documents can be clustered efficiently even when the dimension is high because it uses vector space representation for documents which is suitable for high dimensions.
Authors and Affiliations
R. Malathi Ravindran , Dr. Antony Selvadoss Thanamani
Conditional Variables Double Sampling Plan for Weibull Distributed Lifetimes under Sudden Death Testing
n this paper, we propose a conditional sampling plan called conditional double sampling plan for lot acceptance of parts whose life time follows a Weibull distribution with known shape parameter under sudden death testin...
Estimation of Area under the ROC Curve Using Exponential and Weibull Distributions
In recent years the Receiver Operating Characteristic (ROC) curves received much attention in medical diagnosis for classifying the subjects into one of the two groups. Many researchers have provided the mathematical for...
Construction of Graeco Sudoku Square Designs of Odd Orders
The Sudoku puzzle typically consists of a nine-by-nine grid, in which some of the spaces contain numbers; most of the spaces are blank. The goal is to fill in the blanks with digits from 1 to 9 so that each row, each col...
Review on Analysis of Gene Expression Data Using Biclustering Approaches
In this paper, survey on biclustering approaches for Gene Expression Data (GED) is carried out. Some of the issues are Correlation, Class discovery, Coherent biclusters and coregulated biclusters. Each table entry is cal...
A Class of Harmonic Meromorphic Functions of Complex Order
The seminal work of Clunie and Sheil-Small [3] on harmonic mappings gave rise to studies on subclasses of complex-valued harmonic univalent functions. In this paper a class of harmonic meromorphic functions of the form f...