Filter-Wrapper Approach to Feature Selection Using PSO-GA for Arabic Document Classification with Naive Bayes Multinomial
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 6
Abstract
Abstract: Text categorization and feature selection are two of the many text data mining problems. In text categorization, the document that contains a collection of text will be changed to the dataset format, the dataset that consists of features and class, words become features and categories ofdocuments become class on this dataset. The number of features that too many can cause a decrease in performance of classifier because many of the features that are redundant and not optimal so that feature selection is required to select the optimal features. This paper proposed a feature selectionstrategy based on Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) methods for Arabic Document Classification with Naive Bayes Multinomial (NBM). Particle Swarm Optimization (PSO) is adopted in the first phase with the aim to eliminate the insignificant features and prepared the reduce features to the next phase. In the second phase, the reduced features are optimized using the new evolutionary computation method, Genetic Algorithm (GA). These methods have greatly reduced the features and achieved higher classification compared with full features without features selection. From the experiment that has been done the obtained results of accuracy are NBM85.31%, NBM-PSO 83.91% and NBM-PSO-GA 90.20%.
Authors and Affiliations
Indriyani , Wawan Gunawan , Ardhon Rakhmadi
Big Data: The Future of Data Storage
Abstract: According to Internet World statistics, todayInternet has 1.7 Billion users, compared with the population of 6.7 billion people.Around 40% of the world population is connected via internet across the gl...
Automated system for deployment of websites and windows services to the production servers
Abstract: This paper is discussed on automated system for deployment of websites and windows services to the production servers. The aim of this paper is to develop and implement an automatic system for deployment...
Segmentation of Lung Tumor in CT Scan Images using FA-FCM Algorithms
Abstract: Lung Cancer is dangerous disease that cause most human to death at early age and it is an uncontrolled cell growth in tissues on the lung. Many algorithms and technologies are introduced for identifying the lun...
Comparison of different Ant based techniques for identification of shortest path in Distributed Network
HiRLoc: High-resolution Robust Localization for Wireless Sensor Networks
In this paper the tiny nodes are deployed in target areas according to the deployment nature of target but nodes are easily targeted by attacker with physical attack of node capture. So, secure,communications in som...