SOLVING THE PROBLEM OF DETECTING PHISHING WEBSITES USING ENSEMBLE LEARNING MODELS
Journal Title: Scientific Journal of Astana IT University - Year 2022, Vol 12, Issue 12
Abstract
Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at countering the implementation of such attacks. Malicious website detection is essential to prevent the spread of malware and protect end users from victims. Unfortunately, malicious URL detection still needs to be better understood due to a lack of features and inaccurate classification. Possible sources were examined in order to investigate the subject. Based on the collected information from previous studies, this study is devoted to solving the problem of detecting phishing websites using Ensemble Learning. The aim of the work is to choose the most optimal algorithm for classifying phishing websites using gradient boosting algorithms. AdaBoost, CatBoost, and Gradient Boosting Classifier were chosen as Ensemble Learning algorithms and were used to improve the efficiency of classifiers. Practical studies of the parameters of each algorithm for finding the optimal classification model are given. Research and experiments were carried out on a dataset containing information extracted from the contents of a URL: main URL, domain, directory, and file. A thorough Exploratory Data Analysis (EDA) was carried out, as a result of which the main dependencies and patterns of determining phishing resources were identified using correlation analysis. ROC AUC Score was chosen as an evaluation metric for the algorithms. The best result for predicting phishing websites was demonstrated by the AdaBoost Classifier algorithm, with an average ROC AUC score of 99%. The results of the experiments were illustrated in the form of graphs and tables.
Authors and Affiliations
Dinara Kaibassova, Margulan Nurtay, Ardak Tau, Mira Kissina
DEVELOPMENT OF DAG BLOCKCHAIN MODEL
In this study the authors present an innovative approach to resolving scalability and efficiency challenges in blockchain technology through the integration of Directed Acyclic Graphs (DAGs). This approach helps to overc...
APPROACH AND STRUCTURE OF SPECIAL ORGANIZATIONAL, METHODOLOGICAL AND TECHNOLOGICAL COMPONENTS OF PROJECT AND PROGRAM PORTFOLIO MANAGEMENT SYSTEMS
The functional limitations of modern corporate project and program management systems are presented. It is shown that the main limitation of such systems is connected with the weak implementation of organizational and...
INFORMATION AND ANALYTICAL TOOLS FOR MONITORING THE PRICES OF MATERIAL AND TECHNICAL RESOURCES (MTR) OF CONSTRUCTION
The article deals with features and principles of the price monitoring system for material and technical resources operating now in the road industry. To improve the process of information collection, processing, and a...
EFFECTIVE MANAGEMENT AND OPTIMIZATION OF BUSINESS PROCESSES
This article discusses the problems of introducing an effective business, as well as optimizing business processes. Various approaches to managing and optimizing business processes are analyzed. The reasons for which it...
APPLICATION OF INFORMATION SYSTEMS AND TOOLS IN BIOINFORMATICS
The pace at which scientific data is produced and disseminated has never been as high as it is currently. Modern sequencing technologies make it possible to obtain the genome of a specific organism in a few days, and t...