A Strategy for Training Set Selection in Text Classification Problems

Abstract

An issue in text classification problems involves the choice of good samples on which to train the classifier. Training sets that properly represent the characteristics of each class have a better chance of establishing a successful predictor. Moreover, sometimes data are redundant or take large amounts of computing time for the learning process. To overcome this issue, data selection techniques have been proposed, including instance selection. Some data mining techniques are based on nearest neighbors, ordered removals, random sampling, particle swarms or evolutionary methods. The weaknesses of these methods usually involve a lack of accuracy, lack of robustness when the amount of data increases, over?tting and a high complexity. This work proposes a new immune-inspired suppressive mechanism that involves selection. As a result, data that are not relevant for a classifier’s ?nal model are eliminated from the training process. Experiments show the e?ectiveness of this method, and the results are compared to other techniques; these results show that the proposed method has the advantage of being accurate and robust for large data sets, with less complexity in the algorithm.

Authors and Affiliations

Maria Passini, Katiusca Estébanez, Grazziela Figueredo, Nelson Ebecken

Keywords

Related Articles

2.5 D Facial Analysis via Bio-Inspired Active Appearance Model and Support Vector Machine for Forensic Application

In this paper, a fully automatic 2.5D facial technique for forensic applications is presented. Feature extraction and classification are fundamental processes in any face identification technique. Two methods for feature...

A New Task Scheduling Algorithm using Firefly and Simulated Annealing Algorithms in Cloud Computing

Task scheduling is a challenging and important issue, which considering increases in data sizes and large volumes of data, has turned into an NP-hard problem. This has attracted the attention of many researchers througho...

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Pollutant forecasting is an important problem in the environmental sciences. Data mining is an approach to discover knowledge from large data. This paper tries to use data mining methods to forecast ?PM?_(2.5) concentrat...

 GSM-Based Wireless Database Access For Food And Drug Administration And Control

 GSM (Global system for mobile communication) based wireless database access for food and drug administration and control is a system that enables one to send a query to the database using the short messaging system...

MAS based on a Fast and Robust FCM Algorithm for MR Brain Image Segmentation

In the aim of providing sophisticated applications and getting benefits from the advantageous properties of agents, designing agent-based and multi-agent systems has become an important issue that received further consid...

Download PDF file
  • EP ID EP120335
  • DOI 10.14569/IJACSA.2013.040608
  • Views 79
  • Downloads 0

How To Cite

Maria Passini, Katiusca Estébanez, Grazziela Figueredo, Nelson Ebecken (2013). A Strategy for Training Set Selection in Text Classification Problems. International Journal of Advanced Computer Science & Applications, 4(6), 54-60. https://europub.co.uk/articles/-A-120335