Different Type of Feature Selection for Text Classification
Journal Title: INTERNATIONAL JOURNAL OF COMPUTER TRENDS & TECHNOLOGY - Year 2014, Vol 10, Issue 2
Abstract
Text categorization is the task of deciding whether a document belongs to a set of pre specified classes of documents. Automatic classification schemes can greatly facilitate the process of categorization. Categorization of documents is challenging, as the number of discriminating words can be very large. Many existing algorithms simply would not work with these many numbers of features. For most text categorization tasks, there are many irrelevant and many relevant features. The main objective is to propose a text classification based on the features selection and pre-processing thereby reducing the dimensionality of the Feature vector and increase the classification accuracy. In the proposed method, machine learning methods for text classification is used to apply some text preprocessing methods in different dataset, and then to extract feature vectors for each new document by using various feature weighting methods for enhancing the text classification accuracy. Further training the classifier by Naive Bayesian (NB) and K-nearest neighbor (KNN) algorithms, the predication can be made according to the category distribution among this k nearest neighbors. Experimental results show that the methods are favorable in terms of their effectiveness and efficiency when compared with other.
Authors and Affiliations
M. Ramya , J. Alwin Pinakas
Different Type of Feature Selection for Text Classification
Text categorization is the task of deciding whether a document belongs to a set of pre specified classes of documents. Automatic classification schemes can greatly facilitate the process of categorization. Categorization...
Data Acquisition and Reduction Algorithm for Shearing Interferometer Based Long Trace Profilometer
The Long Trace Profilometer (LTP) is a non-contact optical profiling instrument, designed to measure the absolute surface figure to nanometer accuracy of long strip flat, spherical and aspherical X-ray optics of up to 12...
A Literature Survey on Face Recognition Techniques
With data and information accumulating in abundance, there is a crucial need for high security. Biometrics has now received more attention. Face biometrics, useful for a person’s authentication is a simple and non-intrus...
Survey on Sparse Coded Features for Content Based Face Image Retrieval
Content based image retrieval, a technique which uses visual contents of image to search images from large scale image databases according to users' interests. This paper provides a comprehensive survey on recent technol...
Wireless Sensor Based Remote Monitoring System For Fluoride Affected Areas Using GPRS and GIS
Recent developments in the availability of low-cost integrated General Packet Radio Service (GPRS)/Global Positioning Systems (GPS) modem and publically available web based Geographical Information Systems (GIS)have enab...