Computational Intelligence Methods for Clustering of SenseTagged Nepali Documents

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 1

Abstract

 Abstract: This paper presents a method using hybridization of self organizing map (SOM ), particle swarmoptimization(PSO) and k-means clustering algorithm for document clustering. Document representation is animportant step for clustering purposes. The common way of represent a text is bag of words approach. Thisapproach is simple but has two drawbacks viz. synonymy and polysemy which arise because of the ambiguity ofthe words and the lack of information about the relations between the words. To avoid the drawbacks of bag ofwords approach words are tagged with senses in WordNet in this paper. Sense tagging of words provide exactsenses of words. Feature vectors are generated using sense tagged documents and clustering is carried outusing proposed hybrid SOM+PSO+K-means algorithm. In the proposed algorithm initially SOM is applied tothe feature vectors to produce the prototypes and then K-means clustering algorithm is applied to cluster theprototypes. Particle Swarm Optimization algorithm is used to find the initial centroid for K-means algorithm.Text documents in Nepali language are used to test the hybrid SOM+PSO+K-means clustering algorithm.

Authors and Affiliations

Sunita Sarkar , Arindam Roy , Bipul Syam Purkayastha

Keywords

Related Articles

Context-Centred Mobile Applications Development For Effective Adoption Of Mobile Technology

Abstract: Recent approaches in mobile computing consider context to be central to the design and implementation of mobile applications. Context considerations enables the mobile application to respond to the needs and pu...

 Intelligent Phishing Website Detection and Prevention System by Using Link Guard Algorithm

 Phishing is a new type of network attack where the attacker creates a replica of an existing Web page to fool users (e.g., by using specially designed e-mails or instant messages) into submitting personal, financ...

 Securing IPv6’s Neighbour and Router Discovery, using Locally Authentication Process

 today’s world.Internet Engineering Task Force (IETF), in IPv6, allowed nodes to Auto configure using neighbour discovery protocol. Neighbour Discovery (ND) and Address auto-configuration mechanisms may be protect...

The impact of using Gregorian calendar dates in systems that adapt localization: In the case of Ethiopia

Ethiopian calendar has 13 months in which 12 months have 30 days equal and the 13th month has 5 or 6 days length. Date is one of the inputs for web or desktop applications. Java Development Kit(JDK) and JodaTime Date Tim...

Customer Relationship Management Using Clustering And Classification Technique

CRM (Customer Relationship Management) is a method and tool that helps organizations to maintain customer relationships in a structured manner. It can help to choose the right people or decide on new products that their...

Download PDF file
  • EP ID EP127150
  • DOI -
  • Views 116
  • Downloads 0

How To Cite

Sunita Sarkar, Arindam Roy, Bipul Syam Purkayastha (2015).  Computational Intelligence Methods for Clustering of SenseTagged Nepali Documents. IOSR Journals (IOSR Journal of Computer Engineering), 17(1), 83-89. https://europub.co.uk/articles/-A-127150