Implementation of Real World Document Clustering Using BIRCH

Abstract

Clustering is “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are coherent internally, but clearly dissimilar to the objects belonging to other clusters. In general, there are two common algorithms. The first one is the hierarchical based algorithm, which includes single link, complete linkage, group average and Ward's method. By aggregating or dividing, documents can be clustered into hierarchical structure, which is suitable for browsing. However, such an algorithm usually suffers from efficiency problems. The other algorithm is developed using the K-means algorithm and its variants. These algorithms can further be classified as hard or soft clustering algorithms. Hard clustering computes a hard assignment – each document is a member of exactly one cluster. The assignment of soft clustering algorithms is soft – a document’s assignment is a distribution over all clusters. In a soft assignment, a document has fractional membership in several clusters. The large variety of documents makes it almost unfeasible to create a general algorithm which can work best in case of all kinds of datasets.

Authors and Affiliations

Prof. Praveen Kumar Gautam, Mrs. Sunita N. Chaudhari

Keywords

Related Articles

Stress Analysis of a Variable Thickness Rotating FGM Disc

A mathematical model to describe stress analysis in a functionally graded rotating disc having linearly varying thickness has been investigated. The properties of the disc material like density and young’s modulus are a...

Rainfall Variability Analysis in Namakkal District, Tamil Nadu

Climatic factor plays a major role in Indian agriculture in that rainfall play a key role. Being rainfall is the important factor for agriculture normally has to rely on secondary data. The study area taken for this ana...

QOS Aware Routing Protocol For Wireless Sensor Networks

Wireless Sensors Network is used to monitor Health Monitoring, Military application, Pollution control, agriculture field etc. with limited energy resources. Various energy efficient Quality of Service (QoS) aware routi...

Fabrication and Performance Evaluation of Thermoacoustic Refrigerator

Thermoacoustic deals with the conversion of heat energy to sound energy or vice versa. Therrnoacoustic cooling devices use the thermoacoustic principle to move heat using sound. They consist of a standing wave tube in w...

Experimental Investigation of Mechanical Characterisation of Al6061 Reinforced With Molybdenum Disulphide (MOS2)-A Review

In the present study, based on the literature review the individual aluminium alloy and combined effect of reinforcements on aluminium alloy discussed. For preparation of composites Al6061 taken as base metal and varyin...

Download PDF file
  • EP ID EP22135
  • DOI -
  • Views 231
  • Downloads 3

How To Cite

Prof. Praveen Kumar Gautam, Mrs. Sunita N. Chaudhari (2016). Implementation of Real World Document Clustering Using BIRCH. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 4(5), -. https://europub.co.uk/articles/-A-22135