Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets

Journal Title: Journal of Information and Organizational Sciences - Year 2015, Vol 39, Issue 2

Abstract

During the process of knowledge discovery in data, imbalanced learning data often emerges and presents a significant challenge for data mining methods. In this paper, we investigate the influence of class imbalanced data on the classification results of artificial intelligence methods, i.e. neural networks and support vector machine, and on the classification results of classical classification methods represented by RIPPER and the Naïve Bayes classifier. All experiments are conducted on 30 different imbalanced datasets obtained from KEEL (Knowledge Extraction based on Evolutionary Learning) repository. With the purpose of measuring the quality of classification, the accuracy and the area under ROC curve (AUC) measures are used. The results of the research indicate that the neural network and support vector machine show improvement of the AUC measure when applied to balanced data, but at the same time, they show the deterioration of results from the aspect of classification accuracy. RIPPER results are also similar, but the changes are of a smaller magnitude, while the results of the Naïve Bayes classifier show overall deterioration of results on balanced distributions. The number of instances in the presented highly imbalanced datasets has significant additional impact on the classification performances of the SVM classifier. The results have shown the potential of the SVM classifier for the ensemble creation on imbalanced datasets.

Authors and Affiliations

Goran Oreški, Stjepan Oreški

Keywords

Related Articles

The Investigation of TLC Model Checker Properties

This paper presents the investigation and comparison of TLC model checking method (TLA Checker) properties. There are two different approaches to method usage which are considered. The first one consists of a transition...

Beyond Knowledge Integration Barriers in ERP Implementations: An Institutional Approach

The objective of the article is to go beyond the knowledge integration barriers in ERP implementations by analyzing structural, technological, intellectual and socioemotional barriers that appear during an ERP implementa...

The Elaboration of Strategic Decisions in the Socio-Economic Systems

The article deals with socio-economic strategic management, which implemented on the basis of Balanced Scorecard, multidimensional modeling and its set-theoretic representation. The main idea of the article is using of m...

An Iterative Automatic Final Alignment Method in the Ontology Matching System

Ontology matching plays an important role in the integration of heterogeneous data sources that are described by ontologies. In order to determine correspondences between ontologies, a set of matchers can be used. After...

Tracking Predictive Gantt Chart for Proactive Rescheduling in Stochastic Resource Constrained Project Scheduling

Proactive-reactive scheduling is important in the situations where the project collaborators need to coordinate their efforts. The coordination is mostly achieved through the combination of the shared baseline schedule a...

Download PDF file
  • EP ID EP485182
  • DOI -
  • Views 79
  • Downloads 0

How To Cite

Goran Oreški, Stjepan Oreški (2015). Two Stage Comparison of Classifier Performances for Highly Imbalanced Datasets. Journal of Information and Organizational Sciences, 39(2), 209-222. https://europub.co.uk/articles/-A-485182