Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets
Journal Title: International Journal of Advanced Research in Artificial Intelligence(IJARAI) - Year 2013, Vol 2, Issue 6
Abstract
Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT). It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT) instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.
Authors and Affiliations
Noha Negm, Passent Elkafrawy, Mohamed Amin, Abdel M. Salem
3D Map Creation Based on Knowledgebase System for Texture Mapping Together with Height Estimation Using Objects’ Shadows with High Spatial Resolution Remote Sensing Satellite Imagery Data
Method for 3D map creation based on knowledgebase system for texture mapping together with height estimation using objects’ shadows with high spatial resolution of remote sensing satellite imagery data is proposed....
Methods for Wild Pig Identifications from Moving Pictures and Discrimination of Female Wild Pigs based on Feature Matching Methods
Methods for wild pig identifications and discrimination of female wild pigs based on feature matching methods with acquired Near Infrared: NIR moving pictures are proposed. Trials and errors are repeated for identi...
Analytical Study of Some Selected Classification Algorithms in WEKA Using Real Crime Data
Data mining in the field of computer science is an answered prayer to the demand of this digital age. It is used to unravel hidden information from large volumes of data usually kept in data repositories to help im...
Rice Crop Quality Evaluation Method through Regressive Analysis between Nitrogen Content and Near Infrared Reflectance of Rice Leaves Measured from Near Field
Rice crop quality evaluation method through regressive analysis between nitrogen content in the rice leaves and near infrared reflectance measurement data from near field, from radio wave controlled helicopter is p...
Realising Dynamism in MediaSense Publish/Subscribe Model for Logical-Clustering in Crowdsourcing
The upsurge of social networks, mobile devices, Internet or Web-enabled services have enabled unprecedented level of human participation in pervasive computing which is coined as crowdsourcing. The pervasiveness of...