Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

Abstract

The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules Mining, an efficient approach for Web Document Clustering (ARWDC) has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT) has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuters datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets.

Authors and Affiliations

Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem

Keywords

Related Articles

A Type-2 Fuzzy in Image Extraction for DICOM Image

Eradication of a desired portion of an image is a very important role in image processing and is also called feature extraction. This is mainly concern about reducing the number of possessions required to portray a large...

Generation of Sokoban Stages using Recurrent Neural Networks

Puzzles and board games represent several important classes of AI problems, but also represent difficult complexity classes. In this paper, we propose a deep learning based alternative to train a neural network model to...

K-means Based Automatic Pests Detection and Classification for Pesticides Spraying

Agriculture is the backbone to the living being that plays a vital role to country’s economy. Agriculture production is inversely affected by pest infestation and plant diseases. Plants vitality is directly affected by t...

A Review of Solutions for SDN-Exclusive Security Issues

Software Defined Networking is a paradigm still in its emergent stages in the realm of production-scale networks. Centralisation of network control introduces a new level of flexibility for network administrators and pro...

The Role of Image Enhancement in Citrus Canker Disease Detection

Digital image processing is employed in numerous areas of biology to identify and analyse problems. This approach aims to use image processing techniques for citrus canker disease detection through leaf inspection. Citru...

Download PDF file
  • EP ID EP136006
  • DOI 10.14569/IJACSA.2013.040820
  • Views 155
  • Downloads 0

How To Cite

Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem (2013). Investigate the Performance of Document Clustering Approach Based on Association Rules Mining. International Journal of Advanced Computer Science & Applications, 4(8), 142-151. https://europub.co.uk/articles/-A-136006