Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets

Abstract

 Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT). It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT) instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.

Authors and Affiliations

Noha Negm, Passent Elkafrawy, Mohamed Amin, Abdel M. Salem

Keywords

Related Articles

 3D Map Creation Based on Knowledgebase System for Texture Mapping Together with Height Estimation Using Objects’ Shadows with High Spatial Resolution Remote Sensing Satellite Imagery Data

 Method for 3D map creation based on knowledgebase system for texture mapping together with height estimation using objects’ shadows with high spatial resolution of remote sensing satellite imagery data is proposed....

 Methods for Wild Pig Identifications from Moving Pictures and Discrimination of Female Wild Pigs based on Feature Matching Methods

 Methods for wild pig identifications and discrimination of female wild pigs based on feature matching methods with acquired Near Infrared: NIR moving pictures are proposed. Trials and errors are repeated for identi...

 Analytical Study of Some Selected Classification Algorithms in WEKA Using Real Crime Data

 Data mining in the field of computer science is an answered prayer to the demand of this digital age. It is used to unravel hidden information from large volumes of data usually kept in data repositories to help im...

Rice Crop Quality Evaluation Method through Regressive Analysis between Nitrogen Content and Near Infrared Reflectance of Rice Leaves Measured from Near Field

 Rice crop quality evaluation method through regressive analysis between nitrogen content in the rice leaves and near infrared reflectance measurement data from near field, from radio wave controlled helicopter is p...

 Realising Dynamism in MediaSense Publish/Subscribe Model for Logical-Clustering in Crowdsourcing

 The upsurge of social networks, mobile devices, Internet or Web-enabled services have enabled unprecedented level of human participation in pervasive computing which is coined as crowdsourcing. The pervasiveness of...

Download PDF file
  • EP ID EP141055
  • DOI -
  • Views 117
  • Downloads 0

How To Cite

Noha Negm, Passent Elkafrawy, Mohamed Amin, Abdel M. Salem (2013).  Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets. International Journal of Advanced Research in Artificial Intelligence(IJARAI), 2(6), 6-14. https://europub.co.uk/articles/-A-141055