Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets

Abstract

 Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT). It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT) instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.

Authors and Affiliations

Noha Negm, Passent Elkafrawy, Mohamed Amin, Abdel M. Salem

Keywords

Related Articles

Adaptive Group Organization Cooperative Evolutionary Algorithm for TSK-type Neural Fuzzy Networks Design

This paper proposes a novel adaptive group organization cooperative evolutionary algorithm (AGOCEA) for TSK-type neural fuzzy networks design. The proposed AGOCEA uses group-based cooperative evolutionary algorithm and s...

 3D Map Creation Based on Knowledgebase System for Texture Mapping Together with Height Estimation Using Objects’ Shadows with High Spatial Resolution Remote Sensing Satellite Imagery Data

 Method for 3D map creation based on knowledgebase system for texture mapping together with height estimation using objects’ shadows with high spatial resolution of remote sensing satellite imagery data is proposed....

 The Solution of Machines’ Time Scheduling Problem Using Artificial Intelligence Approaches

 The solution of the Machines’ Time Scheduling Problem (MTSP) is a hot point of research that is not yet matured, and needs further work. This paper presents two algorithms for the solution of the Machines’ Time Sch...

 Method for 3D Image Representation with Reducing the Number of Frames based on Characteristics of Human Eyes

 Method for 3D image representation with reducing the number of frames based on characteristics of human eyes is proposed together with representation of 3D depth by changing the pixel transparency. Through experime...

 Appropriate Tealeaf Harvest Timing Determination Referring Fiber Content in Tealeaf Derived from Ground based Nir Camera Images

 Method for most appropriate tealeaves harvest timing with the reference to the fiber content in tealeaves which can be estimated with ground based Near Infrared (NIR) camera images is proposed. In the proposed meth...

Download PDF file
  • EP ID EP141055
  • DOI -
  • Views 127
  • Downloads 0

How To Cite

Noha Negm, Passent Elkafrawy, Mohamed Amin, Abdel M. Salem (2013).  Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets. International Journal of Advanced Research in Artificial Intelligence(IJARAI), 2(6), 6-14. https://europub.co.uk/articles/-A-141055