A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Document Categorization
Journal Title: International Journal of engineering Research and Applications - Year 2017, Vol 7, Issue 3
Abstract
Assigning documents to related categories is critical task which is used for effective document retrieval. Automatic text classification is the process of assigning new text document to the predefined categories based on its content. In this paper, we implemented and performed comparison of Naïve Bayes and Centroid-based algorithms for effective document categorization of English language text. In Centroid Based algorithm, we used Arithmetical Average Centroid (AAC) and Cumuli Geometric Centroid (CGC) methods to calculate centroid of each class. Experiment is performed on R-52 dataset of Reuters-21578 corpus. Micro Average F1 measure is used to evaluate the performance of classifiers. Experimental results show that Micro Average F1 value for NB is greatest among all followed by Micro Average F1 value of CGC which is greater than Micro Average F1 of AAC. All these results are valuable for future research.
Authors and Affiliations
Rupali P. Patil, R. P. Bhavsar, B. V. Pawar
Enzyme-Assisted Extraction of Anthocyanins Pigment from Purple Sweet Potatoes (PrunusnepalensisL.)
Herein, anthocyanins pigment was extracted from purple sweet potatoes (PrunusnepalensisL.) with the assistant of the enzymes alpha-amylase in order to gather the natural colorants used in the food industry. To optimize t...
Analyze and Detect Packet Loss for Data Transmission in WSN
An emerging technology is Wireless Sensor Network where sensors are deployed at extreme geographical locations where human intervention is not possible. The data transferred through the sensor nodes are majorly used in c...
Design and Implementation of Area Optimized, Low Complexity CMOS 32nm Technology Based NCO
A numerically controlled oscillator (NCO) is a digital signal generator which is a very important block in many Digital Communication Systems such as Software Defined Radios, Digital Radio set and Modems, Down/Up convert...
Mild balanced Intuitionistic Fuzzy Graphs
In this paper, we introduce intense subgraphs and feeble subgraphs based on their densities and discuss mild balanced IFG and equally balanced intuitionistic fuzzy subgraphs and their properties. The operations “sum” and...
Structural Run Based Feature Vector to Classify Printed Tamil Characters Using Neural Network
Feature Extraction plays most crucial and important role in character recognition. The selection of stable and representative set of features is the main problem in pattern recognition. Because of font characteristics an...