Implementation of Real World Document Clustering Using BIRCH

Abstract

Clustering is “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are coherent internally, but clearly dissimilar to the objects belonging to other clusters. In general, there are two common algorithms. The first one is the hierarchical based algorithm, which includes single link, complete linkage, group average and Ward's method. By aggregating or dividing, documents can be clustered into hierarchical structure, which is suitable for browsing. However, such an algorithm usually suffers from efficiency problems. The other algorithm is developed using the K-means algorithm and its variants. These algorithms can further be classified as hard or soft clustering algorithms. Hard clustering computes a hard assignment – each document is a member of exactly one cluster. The assignment of soft clustering algorithms is soft – a document’s assignment is a distribution over all clusters. In a soft assignment, a document has fractional membership in several clusters. The large variety of documents makes it almost unfeasible to create a general algorithm which can work best in case of all kinds of datasets.

Authors and Affiliations

Prof. Praveen Kumar Gautam, Mrs. Sunita N. Chaudhari

Keywords

Related Articles

Secured Fingerprint based Crypto System with Reversible Watermarking Scheme

As the growth of technology every information is passed widely through the internet. To ensure the secure data transmission, cryptography is the most effective solution. Cryptographic key plays an important entity to ov...

Effect of Yogic Exercises on the Physical Fitness Components of Handball Players

Yogic exercises not only increase the general strength but also tone up the muscles because these exercises stretch out the muscles and due to their slow stretch and hold nature along with breathing mechanism improves t...

Autonomous Obstacle Detection and Tracking System

The most challenging domain in intelligent vehicle operation is collision avoidance. In case of extremities drivers show a great tendency to break vehicle than to steer although steering could be a better move. Automati...

Fuzzy Association Rule by Classification Technique

When we use the fuzzy association rule then it is difficult to find the minimum frequency because of distinct fuzzy values of data sets. To resolve such type of problem the classification technique is used in a data set...

Review Paper of Design and Analysis of Two Wheeler Vehicles Rear Shock Absorber

The Hydraulic rectifier can works as a Energy Generator and Shock absorber by converting bi directional shocks into unidirectional rotation with help of 4 check valve. Passive Damper can be converted into active damper...

Download PDF file
  • EP ID EP22135
  • DOI -
  • Views 241
  • Downloads 3

How To Cite

Prof. Praveen Kumar Gautam, Mrs. Sunita N. Chaudhari (2016). Implementation of Real World Document Clustering Using BIRCH. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 4(5), -. https://europub.co.uk/articles/-A-22135