A Strategy for Training Set Selection in Text Classification Problems
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2013, Vol 4, Issue 6
Abstract
An issue in text classification problems involves the choice of good samples on which to train the classifier. Training sets that properly represent the characteristics of each class have a better chance of establishing a successful predictor. Moreover, sometimes data are redundant or take large amounts of computing time for the learning process. To overcome this issue, data selection techniques have been proposed, including instance selection. Some data mining techniques are based on nearest neighbors, ordered removals, random sampling, particle swarms or evolutionary methods. The weaknesses of these methods usually involve a lack of accuracy, lack of robustness when the amount of data increases, over?tting and a high complexity. This work proposes a new immune-inspired suppressive mechanism that involves selection. As a result, data that are not relevant for a classifier’s ?nal model are eliminated from the training process. Experiments show the e?ectiveness of this method, and the results are compared to other techniques; these results show that the proposed method has the advantage of being accurate and robust for large data sets, with less complexity in the algorithm.
Authors and Affiliations
Maria Passini, Katiusca Estébanez, Grazziela Figueredo, Nelson Ebecken
Non-linear Dimensionality Reduction-based Intrusion Detection using Deep Autoencoder
The intrusion detection has become core part of any network of computers due to increasing amount of digital content available. In parallel, the data breaches and malware attacks have also grown in large numbers which ma...
Evaluating Confidentiality Impact in Security Risk Scoring Models
Risk scoring models assume that confidentiality evaluation is based on user estimations. Confidentiality evaluation incorporates the impacts of various factors including systems' technical configuration, on the processes...
Using a Cluster for Securing Embedded Systems
In today's increasingly interconnected world, the deployment of an Intrusion Detection System (IDS) is becoming very important for securing embedded systems from viruses, worms, attacks, etc. But IDSs face many challenge...
The Criteria for Software Quality in Information System: Rasch Analysis
Most of the organization uses information system to manage the information and provide better decision making in order to deliver high quality services. Due to that the information system must be reliable and fulfill the...
Efficient Load Balancing Algorithm for the Arrangement-Star Network
The Arrangement-Star is a well-known network in the literature and it is one of the promising interconnection networks in the area of super computing, it is expected to be one of the attractive alternatives in the future...