Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets

Abstract

 Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT). It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT) instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.

Authors and Affiliations

Noha Negm, Passent Elkafrawy, Mohamed Amin, Abdel M. Salem

Keywords

Related Articles

Measures for Testing the Reactivity Property of a Software Agent

Agent technology is meant for developing complex distributed applications. Software agents are the key building blocks of a Multi-Agent System (MAS). Software agents are unique in its nature as it possesses certain disti...

 A Design of a Multi-Agent Smart E-Examiner

 this paper proposes a design of an application of multi agent technology on a semantic net knowledge base, to build a smart e-examiner system. This e-examiner could be used in building and grading a personalized sp...

A Novel 9/7 Wavelet Filter banks For Texture Image Coding

This paper proposes a novel 9/7 wavelet filter bank for texture image coding applications based on lifting a 5/3 filter to a 7/5 filter, and then to a 9/7 filter. Moreover, a one-dimensional optimization problem for the...

Multi Spectral Image Classification Method with Selection of Independent Spectral Features through Correlation Analysis

Multi spectral image classification method with selection processes of independent spectral features through correlation analysis is proposed. The proposed method is validated by applying to the polarimetric Synthetic Ap...

Automatic Melakarta Raaga Identification Syste: Carnatic Music

It is through experience one could as certain that the classifier in the arsenal or machine learning technique is the Nearest Neighbour Classifier. Automatic melakarta raaga identification system is achieved by identifyi...

Download PDF file
  • EP ID EP141055
  • DOI -
  • Views 128
  • Downloads 0

How To Cite

Noha Negm, Passent Elkafrawy, Mohamed Amin, Abdel M. Salem (2013).  Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets. International Journal of Advanced Research in Artificial Intelligence(IJARAI), 2(6), 6-14. https://europub.co.uk/articles/-A-141055