Empirical Assessment of Ensemble based Approaches to Classify Imbalanced Data in Binary Classification
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2019, Vol 10, Issue 3
Abstract
Classifying imbalanced data with traditional classifiers is a huge challenge now-a-days. Imbalance data is a situation wherein the ratio of data within classes is not same. Many real life situations deal with such problems e.g. Web spam detection, Credit card frauds, and Fraudulent telephone calls. The problem exists everywhere when our objective is to identify exceptional cases. The problem is handled by researchers either by modifying the existing classifications methods or by developing new methods. This paper review ensemble based approaches (Boosting and Bagging based) designed to address imbalance in classes by focusing on binary classification. We compared 6 Boosting based, 7 Bagging based and 2 hybrid ensembles for their performance in imbalance domain. We use KEEL tool to evaluate the performance of these methods by implementing the methods on seven imbalance data having class imbalance ratio from 1.82 to as high as 129.44. Area Under the curve (AUC) parameter is recorded as the performance metric. We also statistically analyzed the methods using Friedman rank test and Wilcoxon Matched Pair signed rank test to strengthen the visual interpretations. After analysis, it is proved that RusBoost ensemble outperformed every other ensemble in the imbalanced data situations.
Authors and Affiliations
Prabhjot Kaur, Anjana Gosain
Deploying an Application on the Cloud
Cloud Computing, the impending need of computing as an optimal utility, has the potential to take a gigantic leap in the IT industry, is structured and put to optimal use with regard to the contemporary trends. Developer...
K-means Based Automatic Pests Detection and Classification for Pesticides Spraying
Agriculture is the backbone to the living being that plays a vital role to country’s economy. Agriculture production is inversely affected by pest infestation and plant diseases. Plants vitality is directly affected by t...
A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition
Protein fold recognition plays an important role in computational protein analysis since it can determine protein function whose structure is unknown. In this paper, a Classified Sequential Pattern mining technique for P...
Performance Analysis of Route Redistribution among Diverse Dynamic Routing Protocols based on OPNET Simulation
Routing protocols are the fundamental block of selecting the optimal path from a source node to a destination node in internetwork. Due to emerge the large networks in business aspect thus; they operate diverse routing p...
Computer Aided Design and Simulation of a Multiobjective Microstrip Patch Antenna for Wireless Applications
The utility and attractiveness of microstrip antennas has made it ever more important to find ways to precisely determine the radiation patterns of these antennas. Taking benefit of the added processing power of today’...