Filter-Wrapper Approach to Feature Selection Using PSO-GA for Arabic Document Classification with Naive Bayes Multinomial
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 6
Abstract
Abstract: Text categorization and feature selection are two of the many text data mining problems. In text categorization, the document that contains a collection of text will be changed to the dataset format, the dataset that consists of features and class, words become features and categories ofdocuments become class on this dataset. The number of features that too many can cause a decrease in performance of classifier because many of the features that are redundant and not optimal so that feature selection is required to select the optimal features. This paper proposed a feature selectionstrategy based on Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) methods for Arabic Document Classification with Naive Bayes Multinomial (NBM). Particle Swarm Optimization (PSO) is adopted in the first phase with the aim to eliminate the insignificant features and prepared the reduce features to the next phase. In the second phase, the reduced features are optimized using the new evolutionary computation method, Genetic Algorithm (GA). These methods have greatly reduced the features and achieved higher classification compared with full features without features selection. From the experiment that has been done the obtained results of accuracy are NBM85.31%, NBM-PSO 83.91% and NBM-PSO-GA 90.20%.
Authors and Affiliations
Indriyani , Wawan Gunawan , Ardhon Rakhmadi
A Comparative Study between Time Series and Neural Networkfor Exchange Rate Forecasting
Abstract: Exchange rate forecasting has become a new research topic in present time for best market strategy,planning of investment for investor in foreign project and also for business profit. The methods or proce...
Text Extraction of Vehicle Number Plate and Document Images Using Discrete Wavelet Transform in MATLAB
Text Extraction from colour images is a challenging task in computer vision. The concept of text extraction is derived from the vehicle plate recognization and their characters extractions individually. Some examples of...
Retinal Vessels Segmentation Using Supervised Classifiers for Identification of Cardio Vascular Diseases
The risk of cardio vascular diseases can be identified by measuring the retinal blood vessel. The identification of wrong blood vessel may result in wrong clinical diagnosis. This proposed system addresses the &n...
Service Cost Estimation in Cloud Environment Using a Third Party Web Server: A Comparative Analysis With and Without Using Cloud Computing
This paper maps the idea of software maintenance cost estimation process onto cloud computing service cost. We have many models for effort estimation in maintenance. Here we implement any cost estimation model to...
Empirical Study on Classification Algorithm For Evaluation of Students Academic Performance
Abstract: Data mining techniques (DMT) are extensively used in educational field to find new hidden patterns from student’s data. In recent years, the greatest issues that educational institutions are facing the unstable...