Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

Abstract

The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules Mining, an efficient approach for Web Document Clustering (ARWDC) has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT) has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuters datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets.

Authors and Affiliations

Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem

Keywords

Related Articles

An Optimized Inset Feed Circular Cross Strip Antenna Design for C-Band Satellite Links

This proposed antenna model and progressing the investigation of an inset fed wideband circular slotted patch antenna is suitable for 5.2 GHz satellite C-band applications. A circularly shaped slot has been chosen to be...

 An Improved Grunwald-Letnikov Fractional Differential Mask for Image Texture Enhancement

 Texture plays an important role in identification of objects or regions of interest in an image. In order to enhance this textural information and overcome the limitations of the classical derivative operators a tw...

Face Age Estimation Approach based on Deep Learning and Principle Component Analysis

This paper presents an approach for age estimation based on faces through classifying facial images into predefined age-groups. However, a task such as the one at hand faces several difficulties because of the different...

Rab-KAMS: A Reproducible Knowledge Management System with Visualization for Preserving Rabbit Farming and Production Knowledge

The sudden rise in rural-to-urban migration has been a key challenge threatening food security and most especially the survival of Rabbit Farming and Production (RFP) in Sub-Saharan Africa. Currently, significant knowled...

Using Hybrid Evolutionary Algorithm based Adaptive Filtering

Noise degrades the overall efficiency of the data transmission in the networking models which is no different in Cognitive Radio Adhoc Networks (CRAHNs). For efficient opportunistic routing in CRAHN, the Modified SMOR (M...

Download PDF file
  • EP ID EP136006
  • DOI 10.14569/IJACSA.2013.040820
  • Views 97
  • Downloads 0

How To Cite

Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem (2013). Investigate the Performance of Document Clustering Approach Based on Association Rules Mining. International Journal of Advanced Computer Science & Applications, 4(8), 142-151. https://europub.co.uk/articles/-A-136006