An Efficient Text Clustering Approach using Affinity Propagation with weight modification
Journal Title: International Journal on Computer Science and Engineering - Year 2014, Vol 6, Issue 5
Abstract
Recently the text mining has emerged as one of the most important fields of data mining because of most of the searching in the web is done on the basis of provided text, also the increasing use of social web network uses the text as major component and extracting the effective information directly or indirectly requires an efficient grouping algorithm which should be capable of providing efficient clustering. The most widely used techniques use vector space model to find equivalent vector of the text for clustering. The vector space model represents the text on the form of n-tuples numeric array (vector) where each dimension represents the unique word and the value is the weight of that word on the basis of term frequency-inverse document frequency (tf-idf), the problem of the technique is that the unique words count in any document may be very large which will create the similarly long vectors whose processing will require large memory with processing power secondly analysis may be required a bias categorical grouping which not addressed in the above technique. Hence in this paper an efficient clustering approach is presented which uses one dimension for the group of the words representing the similar area of interest with that we have also considered the uneven weighting of each dimension depending upon the categorical bias during clustering. After creating the vector the clustering is performed using seedsaffinity clustering technique. Finally to study the performance of the presented algorithm, it is applied to the benchmark data set Reuters-21578 and compared it for F-measure, Entropy and Execution time with k-means algorithm and the original AP (affinity propagation) algorithm the results shows that the presented algorithm outperforms the others by acceptable margin.
Authors and Affiliations
Isha Sharma , Prof. mahak motwani
A Data Mining Perspective on the Prevalence of Polio in India
Polio, although has been eradicated from many parts of the world still continues to be prevalent in countries like India, Nigeria and Pakistan. This is a source of concern because a single human carrier in any part of th...
An agent -based Intelligent System to enhance E-Learning through Mining Techniques
The growth of Internet has created new ways for education systems. Learners and teachers realize their pedagogic ctivities with less effort, time and money. Agent Based ntelligent System (ABIS) have proved their worth...
Spam Classification using new kernel function in Support Vector Machine
Due to the increase in internet users, there is a rapid growth in spam e-mails. In recent years, kernel function have received major attention, particularly due to the increased popularity of Support Vector Machine. It i...
Relation based Ontology Matching using Alignment Strategies
The set of relation within a knowledge domain will be expressed with a help of Ontology, but data within the knowledge domain get scattered all over its space. To get a most precise result there must be necessary to rela...
Multiplication Algorithms for VLSI - A Review
In today’s digital world, where portable computers have become as small as the size of palm limitation on processing speed has increased. Thus there’s a need for modification in the traditional approach to overcome this...