Analyzing Resampling Techniques for Addressing the Class Imbalance in NIDS using SVM with Random Forest Feature Selection

Journal Title: International Journal of Experimental Research and Review - Year 2024, Vol 43, Issue 7

Abstract

The purpose of Network Intrusion Detection Systems (NIDS) is to ensure and protect computer networks from harmful actions. A major concern in NIDS development is the class imbalance problem, i.e., normal traffic dominates the communication data plane more than intrusion attempts. Such a state of affairs can pose certain hazards to the effectiveness of detection algorithms, including those useful for detecting less frequent but still highly dangerous intrusions. This paper aims to utilize resampling techniques to tackle this problem of class imbalance in NIDS using a Support Vector Machine (SVM) classifier alongside utilizing features selected by Random Forest to improve the feature subset selection process. The analysis highlights the combativeness of each sampling method, offering insights into their efficiency and practicality for real-world applications. Four resampling techniques are analyzed. Such techniques include Synthetic Minority Over-sampling Technique (SMOTE), Random Under-sampling (RUS), Random Over-sampling (ROS) and SMOTE with two different combinations i.e., RUS SMOTE and RUS ROS. Feature selection was done using Random Forest, which was improved by Bayesian methods to create subsets of features with feature rankings determined by Cumulative Feature Importance Score (CFIS). The CIDDS-2017 dataset is used for the performance evaluation, and the metrics used include accuracy, precision, recall, F-measure and CPU time. The algorithm that performs best overall in the CFIS feature subsets is SMOTE, and the features that give the best result are selected at the 90% level with 25 features. This subset accomplishes a relative accuracy enhancement of 0.08% than the other approaches. The RUS+ROS technique is also fine but somehow slower than SMOTE. On the other hand, RUS+SMOTE shows relatively poor results although it consumes less time in terms of computational time compared to other methods, giving about 50% of the performance shown by the other methods. This paper's novelty is adapting the RUS method as a standalone test for screening new and potentially contaminated datasets. The standalone RUS method is more efficient in terms of computations; the algorithm returned the best result of 98.13% accuracy at 85% at the CFIS level of 34 features with a computation time of 137.812 s. It is also noted that SMOTE is considered to be proficient among all resampling techniques used for handling the problem of class imbalance in NIDS, vice 90% CFIS feature subset. Future research directions could include using these techniques in different data sets and other machine learning and deep learning methods together with ROC curve analysis to provide useful pointers to NIDS designers on how to select the right data mining tools and strategies for their projects.

Authors and Affiliations

K. Swarnalatha, Nirmalajyothi Narisetty, Gangadhara Rao Kancherla, Basaveswararao Bobba

Keywords

Related Articles

Potential and Simulation of Functional Compounds Recovery from Clitoria ternatea L. Extract during The Commercial Sterility Process

The butterfly pea flower (Clitoria ternatea L.) is a flower that is identical to blue to purple petals and contains various phytochemical compounds. Heat processing in butterfly pea flower extract can reduce its nutritio...

Bioremediation: Prospects and limitations

Many microorganisms possess the inherent ability to transform hazardous compounds. However, the long-term persistence of many of these contaminants in the environment is a testament to the fact that these naturally occur...

Effects of cold stress, alprazolam and phytomedicine in combination with stress on blood glucose and haematogical parameter of the male albino rat

The present study conducted to investigate the haematological changes and changes of blood glucose level in male albino rat due to cold stress. In this experiment normal 12:12 light dark phases were maintained for all th...

Urban adult overweight and obesity prevalence in North Dum Dum, West Bengal, India

Obesity impacts most of the population, and many countries are predicted to raise the prevalence of adults affected by obesity (OB) and related disorders during the recent decades. OB is uninterruptedly increasing at a s...

Supervised learning for Attack Detection in Cloud

In this study, we approach a supervised learning algorithm to detect attacks in cloud computing. We categorize “Normal” and “Attack” statuses on the dataset. The model evaluation process uses the kappa statistic, the F1-...

Download PDF file
  • EP ID EP747173
  • DOI 10.52756/ijerr.2024.v43spl.004
  • Views 2
  • Downloads 0

How To Cite

K. Swarnalatha, Nirmalajyothi Narisetty, Gangadhara Rao Kancherla, Basaveswararao Bobba (2024). Analyzing Resampling Techniques for Addressing the Class Imbalance in NIDS using SVM with Random Forest Feature Selection. International Journal of Experimental Research and Review, 43(7), -. https://europub.co.uk/articles/-A-747173