A Strategy for Training Set Selection in Text Classification Problems

Abstract

An issue in text classification problems involves the choice of good samples on which to train the classifier. Training sets that properly represent the characteristics of each class have a better chance of establishing a successful predictor. Moreover, sometimes data are redundant or take large amounts of computing time for the learning process. To overcome this issue, data selection techniques have been proposed, including instance selection. Some data mining techniques are based on nearest neighbors, ordered removals, random sampling, particle swarms or evolutionary methods. The weaknesses of these methods usually involve a lack of accuracy, lack of robustness when the amount of data increases, over?tting and a high complexity. This work proposes a new immune-inspired suppressive mechanism that involves selection. As a result, data that are not relevant for a classifier’s ?nal model are eliminated from the training process. Experiments show the e?ectiveness of this method, and the results are compared to other techniques; these results show that the proposed method has the advantage of being accurate and robust for large data sets, with less complexity in the algorithm.

Authors and Affiliations

Maria Passini, Katiusca Estébanez, Grazziela Figueredo, Nelson Ebecken

Keywords

Related Articles

  A Data Mining orApproach f the Prediction of Hepatitis C Virus protease Cleavage Sites

 Summary: Several papers have been published about the prediction of hepatitis C virus (HCV) polyprotein cleavage sites, using symbolic and non-symbolic machine learning techniques. The published papers achieved dif...

User-Defined Financial Functions for MS SQL Server

The paper deals with mathematical preparation and subsequent programming of various types of financial functions with using of Transact-SQL in Database Management System MS SQL Server. Financial functions are used to aut...

First Out First Served Algorithm for Mobile Wireless Sensor Networks

Wireless Sensor Networks (WSNs) have recently gained tremendous attention as they cover a vast range of applications requiring an important number of sensor nodes deployed in the area of interest to measure physiological...

Link Prediction Schemes Contra Weisfeiler-Leman Models

Link prediction is of particular interest to the data mining and machine learning communities. Until recently all approaches to the problem used embedding-based methods which leverage either node similarities or latent g...

The Development Process of the Semantic Web and Web Ontology

This paper deals with the semantic web and web ontology. The existing ontology development processes are not catered towards casual web ontology development, a notion analogous to standard web page development. Ontologie...

Download PDF file
  • EP ID EP120335
  • DOI 10.14569/IJACSA.2013.040608
  • Views 86
  • Downloads 0

How To Cite

Maria Passini, Katiusca Estébanez, Grazziela Figueredo, Nelson Ebecken (2013). A Strategy for Training Set Selection in Text Classification Problems. International Journal of Advanced Computer Science & Applications, 4(6), 54-60. https://europub.co.uk/articles/-A-120335