slugA Survey on Techniques used for Sentence Clustering of Text Documents

Abstract

Clustering techniques is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. Fuzzy clustering algorithms allow patterns to belong to all clusters with differing degrees of membership. This is important in domains such as sentence clustering, since a sentence is likely to be related to more than one theme or topic present within a document or set of documents. However, because most sentence similarity measures do not represent sentences in a common metric space, conventional fuzzy clustering approaches based on prototypes or mixtures of Gaussian are generally not applicable to sentence clustering. Some of the clustering algorithms are taken here for literature survey. The survey compared these methods and identified the problems in the existing systems. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to text data, and also new algorithms have recently been proposed specifically aiming at text data. This survey discuss about the different clustering algorithm and similarity measures available. Different problems of current system are also identified. Finally propose a new model for fuzzy clustering of sentence data.

Authors and Affiliations

Jinto Jacob

Keywords

Related Articles

A Quality Function Deployment Methodology for Product Development

A constant challenge for any fast paced industry, such as consumer electronics, is the very short technology life span needed to successfully take a product from conception to market while staying competitive with other...

Performance and Emission Test on Chlorella Algae Oil Blend with Diesel

This study discusses performance and exhaust emission of the vehicle fueled with low content chlorella algae oil (property of the algae oil is similar to the property of diesel) blend with pure diesel in CI engine. The...

Improved Version of Customers Security at ATM Using Noise Level Detector & PIR Sensor

ATM (Automated teller machine) is used by the customers to do a financial transactions like cash with drawls, balance checking etc. as we know that ATM are placed everywhere in all the locations including shopping malls...

Theory and Properties of Binary Relations

Apart from being the most common and easy topic, Relations and function still have got their own complexities to which most of the people are unfamiliar with. This paper is going to be a future reference for all those w...

slugA Telemedicine Device for Monitoring of Patients with Respiratory Diseases

A wireless portable system for monitoring respiratory diseases using a thermal flow sensor to monitor respiratory air flow, a Triaxis micro accelerometer to monitor the body posture, and a photo electric sensor to monit...

Download PDF file
  • EP ID EP18262
  • DOI -
  • Views 296
  • Downloads 14

How To Cite

Jinto Jacob (2014). slugA Survey on Techniques used for Sentence Clustering of Text Documents. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2(6), -. https://europub.co.uk/articles/-A-18262