A Strategy for Training Set Selection in Text Classification Problems

Abstract

An issue in text classification problems involves the choice of good samples on which to train the classifier. Training sets that properly represent the characteristics of each class have a better chance of establishing a successful predictor. Moreover, sometimes data are redundant or take large amounts of computing time for the learning process. To overcome this issue, data selection techniques have been proposed, including instance selection. Some data mining techniques are based on nearest neighbors, ordered removals, random sampling, particle swarms or evolutionary methods. The weaknesses of these methods usually involve a lack of accuracy, lack of robustness when the amount of data increases, over?tting and a high complexity. This work proposes a new immune-inspired suppressive mechanism that involves selection. As a result, data that are not relevant for a classifier’s ?nal model are eliminated from the training process. Experiments show the e?ectiveness of this method, and the results are compared to other techniques; these results show that the proposed method has the advantage of being accurate and robust for large data sets, with less complexity in the algorithm.

Authors and Affiliations

Maria Passini, Katiusca Estébanez, Grazziela Figueredo, Nelson Ebecken

Keywords

Related Articles

A Project Based CS/IS-1 Course with an Active Learning Environment

High level programming languages use system defined data types and the user defined data types in computations. We have developed a project-based CS/IS-1 course to substitute the traditional lecture based classroom to he...

Natural Language Processing and its Use in Education

Natural Language Processing (NLP) is an effective approach for bringing improvement in educational setting. Implementing NLP involves initiating the process of learning through the natural acquisition in the educational...

Towards A Framework for Multilayer Computing of Survivability

The notion of survivability has an important position in today enterprise systems and critical functions. This notion has been defined in different ways. However, lacking a comprehensive and multilayer model for computin...

A Guideline for Decision-making on Business Intelligence and Customer Relationship Management among Clinics

Business intelligence offers the capability to gain insights and perform better in decision-making by using a particular set of technologies and tools. A company’s success to a certain extent depends on customers. The co...

Ranking Method in Group Decision Support to Determine the Regional Prioritized Areas and Leading Sectors using Garrett Score

The main objective of regional development is to achieve equal development in different regions. However, the long duration and complexity of the process may result in the unequal development of some regions. In order to...

Download PDF file
  • EP ID EP120335
  • DOI 10.14569/IJACSA.2013.040608
  • Views 87
  • Downloads 0

How To Cite

Maria Passini, Katiusca Estébanez, Grazziela Figueredo, Nelson Ebecken (2013). A Strategy for Training Set Selection in Text Classification Problems. International Journal of Advanced Computer Science & Applications, 4(6), 54-60. https://europub.co.uk/articles/-A-120335