A Novel Approach for Semi Supervised Document Clustering with Constraint Score based Feature Supervision
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2014, Vol 16, Issue 2
Abstract
Abstract: Text document clustering provides an effective technique to manage a huge amount of retrieval outcome by grouping documents in a small number of meaningful classes. In unsupervised clustering method the unlabeled input data is used to estimate the parameter values. In a semi supervised document clustering both labeled and unlabeled input data is used for document clustering. A semi supervised clustering with feature supervision and constraint score is proposed in this paper. This proposed system which handles document clustering and feature Supervision simultaneously and this system finds the number of clusters automatically. Feature supervision uses pairwise constraints that performs supervision between the each documents. The semi-supervised constraint score that uses both pairwise constraints and the constraint score is to compute relevant features and irrelevant feature on document data set. A variational inference algorithm uses the Dirichlet Process Mixture model for the document clustering.
Authors and Affiliations
S. Princiya, , M. Prabakaran
A Novel Approach To Topological Skeletonization Of English Alphabets And Characters
In this paper we put forward a modified approach towards skeletonization of English alphabets and characters. This algorithm has been designed to find the skeleton of all the typeface of Modern English as present i...
A Brief Survey on Privacy Preserving Techniques in Data Mining
Abstract: Data mining is a process of extracting the required information from large datasets. Privacy preserving data mining deals with hiding a person’s sensitive identity without losing the usability of data. Sensitiv...
Data Classification Algorithm Using k-Nearest Neighbour Method Applied to ECG Data
In medical science, the importance of the Electrocardiography is remarkable since heart diseases constitute one of the major causes of mortality in the world. Electrocardiogram (ECG) is the only way for doctors...
Classification of Student’s E-Learning Experiences’ in SocialMedia via Text Mining
Abstract : In today’s world, social media is used every individual for expressing their feelings, opinion,experiences’ and emotions. Applying data mining on all these emotions expressed in posts, comments and likescalled...
Survey on Load Rebalancing For Distributed File Systems in Clouds
Cloud Computing is an emerging technology, it is based on demand service in which shared resources, information, software and other devices are provided according to the clients to the requirements at,specific time with...