slugA Survey on Techniques used for Sentence Clustering of Text Documents

Abstract

Clustering techniques is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. Fuzzy clustering algorithms allow patterns to belong to all clusters with differing degrees of membership. This is important in domains such as sentence clustering, since a sentence is likely to be related to more than one theme or topic present within a document or set of documents. However, because most sentence similarity measures do not represent sentences in a common metric space, conventional fuzzy clustering approaches based on prototypes or mixtures of Gaussian are generally not applicable to sentence clustering. Some of the clustering algorithms are taken here for literature survey. The survey compared these methods and identified the problems in the existing systems. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to text data, and also new algorithms have recently been proposed specifically aiming at text data. This survey discuss about the different clustering algorithm and similarity measures available. Different problems of current system are also identified. Finally propose a new model for fuzzy clustering of sentence data.

Authors and Affiliations

Jinto Jacob

Keywords

Related Articles

slugA Survey on Current Cloud Computing Trends and Related Security Issues

Cloud Computing is an emerging technology which provides services on the basis of as you pay as you go. It provides resources (e.g. CPU and storage) as general utilities that can be leased and released by users through...

Comparative Studies on Methods of Tannase Assay

Five bacterial cultures were isolated from the tannery effluent and were screened by plate assay method. Three different methods mainly Colorimetric, UV spectrophotometric and Spectrophotometric methods were selected fo...

Irrigation System and Its Methods

This paper provides information of “Irrigation system and its various methods”. There are many old and new techniques are now available for irrigation, which includes modern technology and automation i.e. the use of sen...

Feeder Zone Control and Metal Impurity Detection in Blow Room Machines

this project has been designed to detect the metal impurities present in the raw cotton input for the blow room machines and to monitor the level of cotton in the feeder zone of the machine. This is done with the help o...

GSM Based Finger Vein Authentication Using Near-Infrared Imaging

This paper discuss about the Contactless finger vein authentication utilizing the vein patterns of the person for the testimony. It is a personal identification system that is based on near-infrared (wavelength between...

Download PDF file
  • EP ID EP18262
  • DOI -
  • Views 304
  • Downloads 14

How To Cite

Jinto Jacob (2014). slugA Survey on Techniques used for Sentence Clustering of Text Documents. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2(6), -. https://europub.co.uk/articles/-A-18262