Implementation of Real World Document Clustering Using BIRCH

Abstract

Clustering is “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are coherent internally, but clearly dissimilar to the objects belonging to other clusters. In general, there are two common algorithms. The first one is the hierarchical based algorithm, which includes single link, complete linkage, group average and Ward's method. By aggregating or dividing, documents can be clustered into hierarchical structure, which is suitable for browsing. However, such an algorithm usually suffers from efficiency problems. The other algorithm is developed using the K-means algorithm and its variants. These algorithms can further be classified as hard or soft clustering algorithms. Hard clustering computes a hard assignment – each document is a member of exactly one cluster. The assignment of soft clustering algorithms is soft – a document’s assignment is a distribution over all clusters. In a soft assignment, a document has fractional membership in several clusters. The large variety of documents makes it almost unfeasible to create a general algorithm which can work best in case of all kinds of datasets.

Authors and Affiliations

Prof. Praveen Kumar Gautam, Mrs. Sunita N. Chaudhari

Keywords

Related Articles

Pharmacological Investigation of Ethanolic Extract of Roots of Holoptelea Integrifolia (Roxb.)Plench

The present study deals with pharmacological studies of ethanolic extract of roots of H. integrifolia (Roxb.) Planch (Family: Ulmaceae) . The ethanolic extracts of roots were examined for antimicrobial studies by using...

Critical Failure Analysis of Caustic Slurry Pump

This project is to design the impeller of the turbine for a centrifugal caustic slurry pump to increase its efficiency and showing the merits of designing parameters (six blade turbine, design (material) changes from im...

Training Free Skin Detection Using Luminance Based Approach

Confront discovery is perceiving human faces in pictures caught by the cameras in the manmade brainpower frameworks. Various methodologies have been adopted for face detection. Face recognition techniques are working on...

DICOM Image Enhancement of Mammogram Breast Cancer

Mammogram breast cancer images have the ability to help oncologist in detecting disease caused by cells normal growth. Developing algorithms and software to analyze these images may also help oncologist in their daily w...

Design of web-based information systems– New challenges for systems development

The web-technology is going through major changes these years, both with respect to types of systems based on webtechnology, organization of the development work, required approaches and competencies, etc. We must rethin...

Download PDF file
  • EP ID EP22135
  • DOI -
  • Views 240
  • Downloads 3

How To Cite

Prof. Praveen Kumar Gautam, Mrs. Sunita N. Chaudhari (2016). Implementation of Real World Document Clustering Using BIRCH. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 4(5), -. https://europub.co.uk/articles/-A-22135