Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets

Journal Title: Journal of Information and Organizational Sciences - Year 2015, Vol 39, Issue 2

Abstract

During the process of knowledge discovery in data, imbalanced learning data often emerges and presents a significant challenge for data mining methods. In this paper, we investigate the influence of class imbalanced data on the classification results of artificial intelligence methods, i.e. neural networks and support vector machine, and on the classification results of classical classification methods represented by RIPPER and the Naïve Bayes classifier. All experiments are conducted on 30 different imbalanced datasets obtained from KEEL (Knowledge Extraction based on Evolutionary Learning) repository. With the purpose of measuring the quality of classification, the accuracy and the area under ROC curve (AUC) measures are used. The results of the research indicate that the neural network and support vector machine show improvement of the AUC measure when applied to balanced data, but at the same time, they show the deterioration of results from the aspect of classification accuracy. RIPPER results are also similar, but the changes are of a smaller magnitude, while the results of the Naïve Bayes classifier show overall deterioration of results on balanced distributions. The number of instances in the presented highly imbalanced datasets has significant additional impact on the classification performances of the SVM classifier. The results have shown the potential of the SVM classifier for the ensemble creation on imbalanced datasets.

Authors and Affiliations

Goran Oreški, Stjepan Oreški

Keywords

Related Articles

Awareness of Cloud Computing in Slovenian and Croatian Micro-Enterprises

This paper presents a comparison of the two studies conducted in Slovenian and Croatian micro-enterprises (µE) about the awareness of the cloud computing (CC). We were interested in the issues relating to the characteris...

Technology Acceptance Model Based Study of Students’ Attitudes Toward Use of Enterprise Resource Planning Solutions

Enterprise Resource Planning (ERP) solutions are the most frequently used software tool in companies in all industries. Therefore, the labour market requires the knowledge and skills for usage of ERP solutions from gradu...

An Efficient and Effective Image Retrieval System on the basis of Feature, Matching Measure and sub-space Selection

Since its appearance as a research field, Content-based Image Retrieval (CBIR) system has increasingly received an important attention. Review of literature reveals that the efforts put, up to now, in the field address e...

Estimation and Comparison of Underground Economy in Croatia and European Union Countries: Fuzzy Logic Approach

Underground economy (UE) is one of the undesired facts in every country. The size of the underground economy is an important parameter in determining the effectiveness of fiscal and monetary policy, the rate of economic...

The Development of Conceptual, Mathematical and System Dynamics Model for Food Industry Wastewater Purifying System

The paper presents the development of the conceptual, mathematical and system dynamics model for the food industry wastewater purification system which removes organic matter. The food industry often located in places wh...

Download PDF file
  • EP ID EP485182
  • DOI -
  • Views 86
  • Downloads 0

How To Cite

Goran Oreški, Stjepan Oreški (2015). Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets. Journal of Information and Organizational Sciences, 39(2), 209-222. https://europub.co.uk/articles/-A-485182