Computational Intelligence Methods for Clustering of SenseTagged Nepali Documents

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 1

Abstract

 Abstract: This paper presents a method using hybridization of self organizing map (SOM ), particle swarmoptimization(PSO) and k-means clustering algorithm for document clustering. Document representation is animportant step for clustering purposes. The common way of represent a text is bag of words approach. Thisapproach is simple but has two drawbacks viz. synonymy and polysemy which arise because of the ambiguity ofthe words and the lack of information about the relations between the words. To avoid the drawbacks of bag ofwords approach words are tagged with senses in WordNet in this paper. Sense tagging of words provide exactsenses of words. Feature vectors are generated using sense tagged documents and clustering is carried outusing proposed hybrid SOM+PSO+K-means algorithm. In the proposed algorithm initially SOM is applied tothe feature vectors to produce the prototypes and then K-means clustering algorithm is applied to cluster theprototypes. Particle Swarm Optimization algorithm is used to find the initial centroid for K-means algorithm.Text documents in Nepali language are used to test the hybrid SOM+PSO+K-means clustering algorithm.

Authors and Affiliations

Sunita Sarkar , Arindam Roy , Bipul Syam Purkayastha

Keywords

Related Articles

 Towards Accurate Estimation of Fingerprint Ridge Orientation  Using BPNN and Ternarization

 Accurate estimation of ridge orientation is a crucial step in fingerprint image enhancement because the performance of a minutiae extraction algorithm and matching heavily relies on the quality of the input fin...

A Bayesian Classification Model for Fraud Detection over ATM Platforms

Abstract: The banking system relies greatly on the use of Automated Teller Machines, debit and credit cards as a vital element of its payment processing systems. The efficient functioning of payment processing systems al...

 Model Based Software Timing Analysis Using Sequence Diagramfor Commercial Applications

 Abstract: The verification of running time of a program is necessary in designing a system with real lifeconstrain. Verification defines lower and upper bounds which reflects control flow that depends on data as we...

 A Novel approach for Evaluation of video tracking Under Real world conditions

 Tracking object in videos has applications in video surveillance and other main applications. The absence of a commonly adopted performance evaluation Framework is hampering advances in the design of effective vi...

[i][b] An image compression using using discrete cosine transforms and JPEG enconder[/b][/i]

[b][/b][i]The proposed Multiresolution decomposition through wavelets transform of fractals coded images reveal strong relationships limit the frequency content .The research is includes other extensions of the wavelet...

Download PDF file
  • EP ID EP127150
  • DOI -
  • Views 87
  • Downloads 0

How To Cite

Sunita Sarkar, Arindam Roy, Bipul Syam Purkayastha (2015).  Computational Intelligence Methods for Clustering of SenseTagged Nepali Documents. IOSR Journals (IOSR Journal of Computer Engineering), 17(1), 83-89. https://europub.co.uk/articles/-A-127150