Investigate the Performance of Document Clustering Approach Based on Association Rules Mining

Abstract

The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules Mining, an efficient approach for Web Document Clustering (ARWDC) has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT) has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuters datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets.

Authors and Affiliations

Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem

Keywords

Related Articles

Impact of Security in QoS Signaling in NGN: Registration Study

New generation networks (NGN) use an IP base to transmit their services as well as voice, video and other services. The IP Multimedia Subsystem (IMS) which represents the network core, allowed controls and accesses into...

Increasing the Target Prediction Accuracy of MicroRNA Based on Combination of Prediction Algorithms

MicroRNA is an oligonucleotide that plays a role in the pathogenesis of several diseases (mentioning Cancer). It is a non-coding RNA that is involved in the control of gene expression through the binding and inhibition o...

Multi-Biometric Systems: A State of the Art Survey and Research Directions

Multi-biometrics is an exciting and interesting research topic. It is used to recognizing individuals for security purposes; to increase security levels. The recent research trends toward next biometrics generation in re...

Word-Based Grammars for PPM

The Prediction by Partial Matching (PPM) compression algorithm is considered one of the most efficient methods for compressing natural language text. Despite the advances of the PPM method for the English language to pre...

Comparative Analysis of Cow Disease Diagnosis Expert System using Bayesian Network and Dempster-Shafer Method

Livestock is a source of animal protein that contains essential acids that improve human intelligence and health. Popular livestock in Indonesia is cow. Consumption of meat per capita is increased by 0.1% kg / capita / y...

Download PDF file
  • EP ID EP136006
  • DOI 10.14569/IJACSA.2013.040820
  • Views 120
  • Downloads 0

How To Cite

Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem (2013). Investigate the Performance of Document Clustering Approach Based on Association Rules Mining. International Journal of Advanced Computer Science & Applications, 4(8), 142-151. https://europub.co.uk/articles/-A-136006