A Strategy for Training Set Selection in Text Classification Problems
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2013, Vol 4, Issue 6
Abstract
An issue in text classification problems involves the choice of good samples on which to train the classifier. Training sets that properly represent the characteristics of each class have a better chance of establishing a successful predictor. Moreover, sometimes data are redundant or take large amounts of computing time for the learning process. To overcome this issue, data selection techniques have been proposed, including instance selection. Some data mining techniques are based on nearest neighbors, ordered removals, random sampling, particle swarms or evolutionary methods. The weaknesses of these methods usually involve a lack of accuracy, lack of robustness when the amount of data increases, over?tting and a high complexity. This work proposes a new immune-inspired suppressive mechanism that involves selection. As a result, data that are not relevant for a classifier’s ?nal model are eliminated from the training process. Experiments show the e?ectiveness of this method, and the results are compared to other techniques; these results show that the proposed method has the advantage of being accurate and robust for large data sets, with less complexity in the algorithm.
Authors and Affiliations
Maria Passini, Katiusca Estébanez, Grazziela Figueredo, Nelson Ebecken
A Project Based CS/IS-1 Course with an Active Learning Environment
High level programming languages use system defined data types and the user defined data types in computations. We have developed a project-based CS/IS-1 course to substitute the traditional lecture based classroom to he...
Natural Language Processing and its Use in Education
Natural Language Processing (NLP) is an effective approach for bringing improvement in educational setting. Implementing NLP involves initiating the process of learning through the natural acquisition in the educational...
Towards A Framework for Multilayer Computing of Survivability
The notion of survivability has an important position in today enterprise systems and critical functions. This notion has been defined in different ways. However, lacking a comprehensive and multilayer model for computin...
A Guideline for Decision-making on Business Intelligence and Customer Relationship Management among Clinics
Business intelligence offers the capability to gain insights and perform better in decision-making by using a particular set of technologies and tools. A company’s success to a certain extent depends on customers. The co...
Ranking Method in Group Decision Support to Determine the Regional Prioritized Areas and Leading Sectors using Garrett Score
The main objective of regional development is to achieve equal development in different regions. However, the long duration and complexity of the process may result in the unequal development of some regions. In order to...