Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets
Journal Title: International Journal of Advanced Research in Artificial Intelligence(IJARAI) - Year 2013, Vol 2, Issue 6
Abstract
Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT). It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT) instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.
Authors and Affiliations
Noha Negm, Passent Elkafrawy, Mohamed Amin, Abdel M. Salem
Adaptive Group Organization Cooperative Evolutionary Algorithm for TSK-type Neural Fuzzy Networks Design
This paper proposes a novel adaptive group organization cooperative evolutionary algorithm (AGOCEA) for TSK-type neural fuzzy networks design. The proposed AGOCEA uses group-based cooperative evolutionary algorithm and s...
3D Map Creation Based on Knowledgebase System for Texture Mapping Together with Height Estimation Using Objects’ Shadows with High Spatial Resolution Remote Sensing Satellite Imagery Data
Method for 3D map creation based on knowledgebase system for texture mapping together with height estimation using objects’ shadows with high spatial resolution of remote sensing satellite imagery data is proposed....
The Solution of Machines’ Time Scheduling Problem Using Artificial Intelligence Approaches
The solution of the Machines’ Time Scheduling Problem (MTSP) is a hot point of research that is not yet matured, and needs further work. This paper presents two algorithms for the solution of the Machines’ Time Sch...
Method for 3D Image Representation with Reducing the Number of Frames based on Characteristics of Human Eyes
Method for 3D image representation with reducing the number of frames based on characteristics of human eyes is proposed together with representation of 3D depth by changing the pixel transparency. Through experime...
Appropriate Tealeaf Harvest Timing Determination Referring Fiber Content in Tealeaf Derived from Ground based Nir Camera Images
Method for most appropriate tealeaves harvest timing with the reference to the fiber content in tealeaves which can be estimated with ground based Near Infrared (NIR) camera images is proposed. In the proposed meth...