Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets
Journal Title: International Journal of Advanced Research in Artificial Intelligence(IJARAI) - Year 2013, Vol 2, Issue 6
Abstract
Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT). It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT) instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.
Authors and Affiliations
Noha Negm, Passent Elkafrawy, Mohamed Amin, Abdel M. Salem
Iris Compression and Recognition using Spherical Geometry Image
this research is considered to be a research to attract attention to the 3D iris compression to store the database of the iris. Actually, the 3D iris database cannot be found and in trying to solve this problem 2D...
A Fuzzy Approach to Classify Learning Disability
The endeavor of this work is to support the special education community in their quest to be with the mainstream. The initial segment of the paper gives an exhaustive study of the different mechanisms of diagnosing...
Solving the Resource Constrained Project Scheduling Problem to Minimize the Financial Failure Risk
In practice, a project usually involves cash in- and out-flows associated with each activity. This paper aims to minimize the payment failure risk during the project execution for the resource-constrained project s...
Category Decomposition Method for Un-Mixing of Mixels Acquired with Spaceborne Based Visible and Near Infrared Radiometers by Means of Maximum Entropy Method with Parameter Estimation Based on Simulated Annealing
Category decomposition method for un-mixing of mixels (Mixed Pixels) which is acquired with spaceborne based visible to near infrared radiometers by means of Maximum Entropy Method (MEM) with parameter estimation b...
Brainstorming Versus Arguments Structuring in Online Forums
We characterize electronic discussion forums as being of one of the following two types: Brainstorming Forums and Arguments Structuring Forums. In this work we analyze and classify the types of threading models occ...