Analyzing Resampling Techniques for Addressing the Class Imbalance in NIDS using SVM with Random Forest Feature Selection
Journal Title: International Journal of Experimental Research and Review - Year 2024, Vol 43, Issue 7
Abstract
The purpose of Network Intrusion Detection Systems (NIDS) is to ensure and protect computer networks from harmful actions. A major concern in NIDS development is the class imbalance problem, i.e., normal traffic dominates the communication data plane more than intrusion attempts. Such a state of affairs can pose certain hazards to the effectiveness of detection algorithms, including those useful for detecting less frequent but still highly dangerous intrusions. This paper aims to utilize resampling techniques to tackle this problem of class imbalance in NIDS using a Support Vector Machine (SVM) classifier alongside utilizing features selected by Random Forest to improve the feature subset selection process. The analysis highlights the combativeness of each sampling method, offering insights into their efficiency and practicality for real-world applications. Four resampling techniques are analyzed. Such techniques include Synthetic Minority Over-sampling Technique (SMOTE), Random Under-sampling (RUS), Random Over-sampling (ROS) and SMOTE with two different combinations i.e., RUS SMOTE and RUS ROS. Feature selection was done using Random Forest, which was improved by Bayesian methods to create subsets of features with feature rankings determined by Cumulative Feature Importance Score (CFIS). The CIDDS-2017 dataset is used for the performance evaluation, and the metrics used include accuracy, precision, recall, F-measure and CPU time. The algorithm that performs best overall in the CFIS feature subsets is SMOTE, and the features that give the best result are selected at the 90% level with 25 features. This subset accomplishes a relative accuracy enhancement of 0.08% than the other approaches. The RUS+ROS technique is also fine but somehow slower than SMOTE. On the other hand, RUS+SMOTE shows relatively poor results although it consumes less time in terms of computational time compared to other methods, giving about 50% of the performance shown by the other methods. This paper's novelty is adapting the RUS method as a standalone test for screening new and potentially contaminated datasets. The standalone RUS method is more efficient in terms of computations; the algorithm returned the best result of 98.13% accuracy at 85% at the CFIS level of 34 features with a computation time of 137.812 s. It is also noted that SMOTE is considered to be proficient among all resampling techniques used for handling the problem of class imbalance in NIDS, vice 90% CFIS feature subset. Future research directions could include using these techniques in different data sets and other machine learning and deep learning methods together with ROC curve analysis to provide useful pointers to NIDS designers on how to select the right data mining tools and strategies for their projects.
Authors and Affiliations
K. Swarnalatha, Nirmalajyothi Narisetty, Gangadhara Rao Kancherla, Basaveswararao Bobba
A Study to Assess the Level of Softskill Practices Among Nursing Students in Selected Colleges
Soft skills are the habits and traits that determine how a person operates in the workplace like communicating with others. Students are in a position to prove themselves in different aspects, which is not possible by te...
Effective medical leaf identification using hybridization of GMM-CNN
Medical plants play a vital role in curing many diseases. These plants, along with their leaves, have medicinal values. If these leaves are identified appropriately, they can be chosen directly to have more significant r...
Enhancing Academic Integrity: An Analysis of Advanced Techniques for Plagiarism Detection using LESK, Word Sense Disambiguation, and SVM
Plagiarism is widespread in academia, from ancient literature to modern research, where scholars' work is copied and published without authorization. In the late 90s, researchers explored various methods to detect plagia...
Evaluation of Antioxidant, Anti-inflammatory and Antimicrobial Potential of Aegel marmelos Fruit Pulp Extracts against Clinical Pathogens
In India, a wide range of medicinal plants are reported. Since ancient times, these medicinal plants have been used by people for the treatment of several diseases. Herbal medicines typically have fewer side effects comp...
Machine Learning-Based Prediction System for Risk Assessment of Hypertension Using Symptoms Investigations
Hypertension is a common condition of cardiovascular disease that poses significant health challenges among the public on a larger scale globally. It is important to accurately predict the risk of hypertension to save pe...