A Strategy for Training Set Selection in Text Classification Problems

Abstract

An issue in text classification problems involves the choice of good samples on which to train the classifier. Training sets that properly represent the characteristics of each class have a better chance of establishing a successful predictor. Moreover, sometimes data are redundant or take large amounts of computing time for the learning process. To overcome this issue, data selection techniques have been proposed, including instance selection. Some data mining techniques are based on nearest neighbors, ordered removals, random sampling, particle swarms or evolutionary methods. The weaknesses of these methods usually involve a lack of accuracy, lack of robustness when the amount of data increases, over?tting and a high complexity. This work proposes a new immune-inspired suppressive mechanism that involves selection. As a result, data that are not relevant for a classifier’s ?nal model are eliminated from the training process. Experiments show the e?ectiveness of this method, and the results are compared to other techniques; these results show that the proposed method has the advantage of being accurate and robust for large data sets, with less complexity in the algorithm.

Authors and Affiliations

Maria Passini, Katiusca Estébanez, Grazziela Figueredo, Nelson Ebecken

Keywords

Related Articles

An Enhanced Steganographic Model Based on DWT Combined with Encryption and Error Correction Techniques

The problem of protecting information, modification, privacy and origin validation are very important issues and became the concern of many researchers. Handling these problems definitely is a big challenge and this is p...

Towards Adaptive user Interfaces for Mobile-Phone in Smart World

All applications are developed for context adaptation and provide communication with users through their interfaces. These applications offer new opportunities for developers as well as users by collecting context data a...

Impact of Web 2.0 on Digital Divide in AJ&K Pakistan

Digital divide is normally measured in terms of gap between those who can efficiently use new technological tools, such as internet, and those who cannot. It was also hypothesized that web 2.0 tools motivate people to us...

Effect of TCP Buffer Size on the Internet Applications

The development of applications, such as online video streaming, collaborative writing, VoIP, text and video messengers is increasing. The number of such TCP-based applications is increasing due to the increasing availab...

Efficiency in Motion: The New Era of E-Tickets

The development of mobile applications has played an important role in technology. Due to recent advances in technology, mobile applications are creating more attraction across the world. Mobile application is a very int...

Download PDF file
  • EP ID EP120335
  • DOI 10.14569/IJACSA.2013.040608
  • Views 77
  • Downloads 0

How To Cite

Maria Passini, Katiusca Estébanez, Grazziela Figueredo, Nelson Ebecken (2013). A Strategy for Training Set Selection in Text Classification Problems. International Journal of Advanced Computer Science & Applications, 4(6), 54-60. https://europub.co.uk/articles/-A-120335