Survey on Feature Selection in Document Clustering - Europub

Search

Apply

Survey on Feature Selection in Document Clustering

Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 3

Abstract

Text mining is to research technologies to discover useful knowledge from enormous collections of documents, and to develop a system to provide knowledge and to support in decision making. Basically cluster means a group of similar data, document clustering means segregating the data into different groups of similar data. Clustering is a fundamental data analysis technique used for various applications such as biology, psychology, control and signal processing, information theory and mining technologies. Text mining is not a stand-alone task that human analysts typically engage in. The goal is to transform text composed of everyday language into a structured, database format. In this way, heterogeneous documents are summarized and presented in a uniform manner. Among others, the challenging problems of text clustering are big volume, high dimensionality and complex semantics.

Authors and Affiliations

MS. K. Mugunthadevi , MRS. S. C. Punitha , Dr. . M. Punithavalli

Keywords

Text mining feature selection information retrieval ontology Document clustering

Related Articles

An Efficient Pruning Technique for Mining Frequent Itemsets in Spatial Databases

Frequent Itemset Mining is evaluating the rules and relationship within the data items are optimizing it, in the large spatial databases (for e.g. Images, Docs, AVI files etc).It is one of the major problems in DM (Data...

Clustering Mixed Data Points Using Fuzzy CMeans Clustering Algorithm for Performance Analysis

Clustering plays an outstanding role in data mining research. Among the various algorithms for clustering, most of the researchers used the Fuzzy C-Means algorithm (FCM) in the areas like computational geometry, data com...

A Novel Biometric system for Person Recognition Using Palm vein Images

The palm vein is one of the most reliable physiological characteristics that can be used to distinguish between individuals. In recent years, it receives more attention from the researchers. In this paper, we present a n...

CLASSIFICATION TECHNIQES IN EDUCATION DOMAIN

Predicting the performance of a student is a great concern to the higher education managements, where several factors affect the performance. The scope of this paper is to investigate the accuracy of data mining techniqu...

Problem Analysis of Routing Protocols in MANET in Constrained Situation

A Mobile Ad-hoc network (MANET) consists of a number of mobile wireless nodes, among which the communication is carried out without having any centralized control. MANET is a self organized, self configurable network hav...

Download PDF file

EP ID EP113574
DOI -
Views 140
Downloads 0