A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Document Categorization
Journal Title: International Journal of engineering Research and Applications - Year 2017, Vol 7, Issue 3
Abstract
Assigning documents to related categories is critical task which is used for effective document retrieval. Automatic text classification is the process of assigning new text document to the predefined categories based on its content. In this paper, we implemented and performed comparison of Naïve Bayes and Centroid-based algorithms for effective document categorization of English language text. In Centroid Based algorithm, we used Arithmetical Average Centroid (AAC) and Cumuli Geometric Centroid (CGC) methods to calculate centroid of each class. Experiment is performed on R-52 dataset of Reuters-21578 corpus. Micro Average F1 measure is used to evaluate the performance of classifiers. Experimental results show that Micro Average F1 value for NB is greatest among all followed by Micro Average F1 value of CGC which is greater than Micro Average F1 of AAC. All these results are valuable for future research.
Authors and Affiliations
Rupali P. Patil, R. P. Bhavsar, B. V. Pawar
“Behavior of Seam Puckering of Polyester, Cotton & blends fabric on High Sewing Thread Tension”
The garment quality means the quality ofseam, which is the very important feature of any form of fabric assembly using sewing operations. The investigation has attempted to find out the relationship between fabric elonga...
Finite Element Analysis of Typical Ground Based Composite Sandwich Radome
Radome encapsulates the Radar and serves as radio frequency transparent shield to the antenna. Radome protects the antenna from external environments which are detrimental to the Electromagnetic performance of the radar....
The Evaluation of Consultant Supervisors Performance on Road Construction Project in East Borneo
The Road constructions in East Borneo, which is developed in 2015, is expected to produce road infrastructure that could be done as the plan and its regardless from the role of the consultant supervisors performance whic...
A Comparison Study of Different Community Detection Approaches and Its Potential Applications for Online Networks
The incredible rising of online networks show that these networks are complex and involving massive data. Giving a very strong interest to set of techniques developed for mining these networks. One of the fundamental app...
Surface Color Identification in Crust & Finished Leathers using K-Means Clustering Algorithm
In general, color identification on the surface level is carried out in the leather industry. Further, the leather is grouped by various parameters. Primarily it takes place through visual assessment only. This practice...