Investigate the Performance of Document Clustering Approach Based on Association Rules Mining
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2013, Vol 4, Issue 8
Abstract
The challenges of the standard clustering methods and the weaknesses of Apriori algorithm in frequent termset clustering formulate the goal of our research. Based on Association Rules Mining, an efficient approach for Web Document Clustering (ARWDC) has been devised. An efficient Multi-Tire Hashing Frequent Termsets algorithm (MTHFT) has been used to improve the efficiency of mining association rules by targeting improvement in mining of frequent termset. Then, the documents are initially partitioned based on association rules. Since a document usually contains more than one frequent termset, the same document may appear in multiple initial partitions, i.e., initial partitions are overlapping. After making partitions disjoint, the documents are grouped within the partition using descriptive keywords, the resultant clusters are obtained effectively. In this paper, we have presented an extensive analysis of the ARWDC approach for different sizes of Reuters datasets. Furthermore the performance of our approach is evaluated with the help of evaluation measures such as, Precision, Recall and F-measure compared to the existing clustering algorithms like Bisecting K-means and FIHC. The experimental results show that the efficiency, scalability and accuracy of the ARWDC approach has been improved significantly for Reuters datasets.
Authors and Affiliations
Noha Negm, Mohamed Amin, Passent Elkafrawy, Abdel M. Salem
Teaching Software Testing using Data Structures
Software testing is typically a rushed and neglected activity that is done at the final stages of software development. In particular, most students tend to test their programs manually and very seldom perform adequate t...
Classification of Alzheimer Disease based on Normalized Hu Moment Invariants and Multiclassifier
There is a great benefit of Alzheimer disease (AD) classification for health care application. AD is the most common form of dementia. This paper presents a new methodology of invariant interest point descriptor for Alzh...
A Survey on Tor Encrypted Traffic Monitoring
Tor (The Onion Router) is an anonymity tool that is widely used worldwide. Tor protect its user privacy against surveillance and censorship using strong encryption and obfuscation techniques which makes it extremely diff...
2.5 D Facial Analysis via Bio-Inspired Active Appearance Model and Support Vector Machine for Forensic Application
In this paper, a fully automatic 2.5D facial technique for forensic applications is presented. Feature extraction and classification are fundamental processes in any face identification technique. Two methods for feature...
Developing an Integrated Cloud-based Framework for Securing Dataflow of Wireless Sensors
Cloud computing environment has been developed rapidly and becomes a popular trend in recent years. It provides on-demand services to several applications with access to an unlimited number of resources such as servers,...