Empirical Assessment of Ensemble based Approaches to Classify Imbalanced Data in Binary Classification

Abstract

Classifying imbalanced data with traditional classifiers is a huge challenge now-a-days. Imbalance data is a situation wherein the ratio of data within classes is not same. Many real life situations deal with such problems e.g. Web spam detection, Credit card frauds, and Fraudulent telephone calls. The problem exists everywhere when our objective is to identify exceptional cases. The problem is handled by researchers either by modifying the existing classifications methods or by developing new methods. This paper review ensemble based approaches (Boosting and Bagging based) designed to address imbalance in classes by focusing on binary classification. We compared 6 Boosting based, 7 Bagging based and 2 hybrid ensembles for their performance in imbalance domain. We use KEEL tool to evaluate the performance of these methods by implementing the methods on seven imbalance data having class imbalance ratio from 1.82 to as high as 129.44. Area Under the curve (AUC) parameter is recorded as the performance metric. We also statistically analyzed the methods using Friedman rank test and Wilcoxon Matched Pair signed rank test to strengthen the visual interpretations. After analysis, it is proved that RusBoost ensemble outperformed every other ensemble in the imbalanced data situations.

Authors and Affiliations

Prabhjot Kaur, Anjana Gosain

Keywords

Related Articles

DNA Sequence Representation and Comparison Based on Quaternion Number System

Conventional schemes for DNA sequence representation, storage, and processing areusually developed based on the character-based formats.We propose the quaternion number system for numerical representation and further pro...

A Calibrating Six-Port Compact Circuit using a New Technique Program

In this paper, a calibration of six-port reflectometer using a new technique program is presented. It has been shown that a calibration procedure is based on explicit method, the method that capturing the output wave for...

CONSTRUCTION OF NEURAL NETWORKS THAT DO NOT HAVE CRITICAL POINTS BASED ON HIERARCHICAL STRUCTURE

A critical point is a point at which the derivatives of an error function are all zero. It has been shown in the literature that critical points caused by the hierarchical structure of a real-valued neural network (NN) c...

Classification of Premature Ventricular Contraction in ECG

Cardiac arrhythmia is one of the most important indicators of heart disease. Premature ventricular contractions (PVCs) are a common form of cardiac arrhythmia caused by ectopic heartbeats. The detection of PVCs by means...

Menu Positioning on Web Pages. Does it Matter?

This paper concerns an investigation by the authors into the efficiency and user opinions of menu positioning in web pages. While the idea and use of menus on web pages is not new, the authors feel there is not enough em...

Download PDF file
  • EP ID EP498374
  • DOI 10.14569/IJACSA.2019.0100307
  • Views 76
  • Downloads 0

How To Cite

Prabhjot Kaur, Anjana Gosain (2019). Empirical Assessment of Ensemble based Approaches to Classify Imbalanced Data in Binary Classification. International Journal of Advanced Computer Science & Applications, 10(3), 48-58. https://europub.co.uk/articles/-A-498374