Implementation of Real World Document Clustering Using BIRCH
Journal Title: International Journal for Research in Applied Science and Engineering Technology (IJRASET) - Year 2016, Vol 4, Issue 5
Abstract
Clustering is “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are coherent internally, but clearly dissimilar to the objects belonging to other clusters. In general, there are two common algorithms. The first one is the hierarchical based algorithm, which includes single link, complete linkage, group average and Ward's method. By aggregating or dividing, documents can be clustered into hierarchical structure, which is suitable for browsing. However, such an algorithm usually suffers from efficiency problems. The other algorithm is developed using the K-means algorithm and its variants. These algorithms can further be classified as hard or soft clustering algorithms. Hard clustering computes a hard assignment – each document is a member of exactly one cluster. The assignment of soft clustering algorithms is soft – a document’s assignment is a distribution over all clusters. In a soft assignment, a document has fractional membership in several clusters. The large variety of documents makes it almost unfeasible to create a general algorithm which can work best in case of all kinds of datasets.
Authors and Affiliations
Prof. Praveen Kumar Gautam, Mrs. Sunita N. Chaudhari
Use of Almond Tree (Terminialia cattapa) Bark Powder for Adsorption of Methylene Blue, a Basic Dye from Aqueous Solutions
dsorption studies of methylene blue (MB) on Almond tree bark powder (ATBP) were carried out by batch experiments. The parameter studied includes initial dye concentration, adsorbent dose, pH, agitation time, agitation s...
slugEffects of Extraction Techniques on Total Flavonoids, Phenols and Antioxidant Activity of Different Plants Extract
Among the known fruits and vegetables, darkcoloured fruits and vegetables have been reported to be good sources of phenolics, including flavonoids, anthocyanins and carotenoids and are recognized as more he...
Multi-Area Economic Dispatch with Valve Point Effect Using Improved Bat Algorithm
This paper presents application of Improved Bat algorithm for solving Multi-area economic load dispatch problem (MAED) considering tie line constraint and valve point loading effect. Improved bat algorithm is an optimiz...
slugDirect torque control of three Phase induction motor using matlab
Induction machines are widely employed in ind ustries due to their rugged structure, high maintainability and economy than DC motors. There has been constant development in the induction motor...
Network Connectivity Optimization for Device-to-Device Wireless System with Femto Cells
The demand for high data rate transmission has triggered the design and development of advanced cellular networks, such as 4th generation long term evolution (LTE) networks. However, their poor coverage and relatively h...