Examining the Impact of Feature Selection Methods on Text Classification

Abstract

Feature selection that aims to determine and select the distinctive terms representing a best document is one of the most important steps of classification. With the feature selection, dimension of document vectors are reduced and consequently duration of the process is shortened. In this study, feature selection methods were studied in terms of dimension reduction rates, classification success rates, and dimension reduction-classification success relation. As classifiers, kNN (k-Nearest Neighbors) and SVM (Support Vector Machines) were used. 5 standard (Odds Ratio-OR, Mutual Information-MI, Information Gain-IG, Chi-Square-CHI and Document Frequency-DF), 2 combined (Union of Feature Selections-UFS and Correlation of Union of Feature Selections-CUFS) and 1 new (Sum of Term Frequency-STF) feature selection methods were tested. The application was performed by selecting 100 to 1000 terms (with an increment of 100 terms) from each class. It was seen that kNN produces much better results than SVM. STF was found out to be the most successful feature selection considering the average values in both datasets. It was also found out that CUFS, a combined model, is the one that reduces the dimension the most, accordingly, it was seen that CUFS classify the documents more successfully with less terms and in short period compared to many of the standard methods.

Authors and Affiliations

Mehmet Fatih KARACA, Safak BAYIR

Keywords

Related Articles

A Real-Time Street Actions Detection

Human action detection in real time is one of the most important and challenging problems in computer vision. Nowadays, CCTV cameras exist everywhere in our lives. However, the contents of these cameras are monitored and...

Modelling and Implementation of Proactive Risk Management in e-Learning Projects: A Step Towards Enhancing Quality of e-Learning

The introduction of e-Learning to higher education institutions has been evolving drastically. However, the quality of e-Learning becomes a central issue in order to provide all stakeholders with the necessary confidence...

Role of Expert Systems in Identification and Overcoming of Dengue Fever

This paper presents a systematic literature review on expert systems which are used for identification and overcoming of Dengue fever. Dengue is a viral disease produced by Flavivirus. The expansion of Dengue fever is be...

Developing Deep Learning Models to Simulate Human Declarative Episodic Memory Storage

Human like visual and auditory sensory devices became very popular in recent years through the work of deep learning models that incorporate aspects of brain processing such as edge and line detectors found in the visua...

A Heterogeneous Framework to Detect Intruder Attacks in Wireless Sensor Networks

Wireless sensor network (WSN) has been broadly implemented in real world applications, such as monitoring of forest fire, military targets detection, medical and/or science areas and above all in our daily home life as w...

Download PDF file
  • EP ID EP259149
  • DOI 10.14569/IJACSA.2017.081250
  • Views 75
  • Downloads 0

How To Cite

Mehmet Fatih KARACA, Safak BAYIR (2017). Examining the Impact of Feature Selection Methods on Text Classification. International Journal of Advanced Computer Science & Applications, 8(12), 380-388. https://europub.co.uk/articles/-A-259149