Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets
Journal Title: Journal of Information and Organizational Sciences - Year 2015, Vol 39, Issue 2
Abstract
During the process of knowledge discovery in data, imbalanced learning data often emerges and presents a significant challenge for data mining methods. In this paper, we investigate the influence of class imbalanced data on the classification results of artificial intelligence methods, i.e. neural networks and support vector machine, and on the classification results of classical classification methods represented by RIPPER and the Naïve Bayes classifier. All experiments are conducted on 30 different imbalanced datasets obtained from KEEL (Knowledge Extraction based on Evolutionary Learning) repository. With the purpose of measuring the quality of classification, the accuracy and the area under ROC curve (AUC) measures are used. The results of the research indicate that the neural network and support vector machine show improvement of the AUC measure when applied to balanced data, but at the same time, they show the deterioration of results from the aspect of classification accuracy. RIPPER results are also similar, but the changes are of a smaller magnitude, while the results of the Naïve Bayes classifier show overall deterioration of results on balanced distributions. The number of instances in the presented highly imbalanced datasets has significant additional impact on the classification performances of the SVM classifier. The results have shown the potential of the SVM classifier for the ensemble creation on imbalanced datasets.
Authors and Affiliations
Goran Oreški, Stjepan Oreški
Agent-Based Modelling Applied to 5D Model of the HIV Infection
This paper proposes a Multi-Agents Model to simulate the phenomenon of the infection by the Human Immunodeficiency Virus (HIV). Since the HIV was isolated in 1983 and found to be the cause of the Acquired Immune Deficien...
Awareness of Cloud Computing in Slovenian and Croatian Micro-Enterprises
This paper presents a comparison of the two studies conducted in Slovenian and Croatian micro-enterprises (µE) about the awareness of the cloud computing (CC). We were interested in the issues relating to the characteris...
Comparison of DPSK and RZ-DPSK Modulations in Optical Channel with Speed of 10 Gbps
This article is devoted to the problematic of error rate and modulations in optical communication. Optic waveguide shows insufficiencies in high speed transfers manifested by corrupted transfer. Although modern technolog...
Do You Walk the Talk in Quality Culture?
We present an action research project to foster quality culture in business processes. The client setting is in the food industry, a vital sector for our society and one of the most regulated in the world. Food productio...
The Current State and Future Perspectives of the Research Information Infrastructure in Croatia
The purpose of this paper is to analyze the existing Croatian research information infrastructure and to outline a new model of the Croatian Current Research Information System (CroRIS), required for the systematical mon...