Investigate the Performance of Document Clustering Approach Based on Association Rules Mining
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2013, Vol 4, Issue 8
Abstract
The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules Mining, an efficient approach for Web Document Clustering (ARWDC) has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT) has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuters datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets.
Authors and Affiliations
Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem
Kit-Build Concept Map with Confidence Tagging in Practical Uses for Assessing the Understanding of Learners
An answer of a learner can be interpreted as a learning evidence for demonstrating the understanding of the learner, while a confidence on the answer represents the belief of the learner as the degree of understanding. I...
An Emergency Unit Support System to Diagnose Chronic Heart Failure Embedded with SWRL and Bayesian Network
In all the regions of the world, heart failure is common and on raise caused by several aetiologies. Although the development of the treatment is fast, there are still lots of cases that lose their lives in emergence sec...
Increase Efficiency of SURF using RGB Color Space
SURF is one of the most robust local invariant feature descriptors. SURF is implemented mainly for gray images. However, color presents important information in the object description and matching tasks as it clearly in...
Towards the Adoption of Smart Manufacturing Systems: A Development Framework
Today, a new era of manufacturing innovation is introduced as Smart Manufacturing Systems (SMS) or Industry 4.0. Many studies have discussed the different characteristics and technologies associated with SMS, however, li...
Vision Based Geo Navigation Information Retrieval
In order to derive the three-dimensional camera position from the monocular camera vision, a geo-reference database is needed. Floor plan is a ubiquitous geo-reference database that every building refers to it during con...