SOLVING THE PROBLEM OF DETECTING PHISHING WEBSITES USING ENSEMBLE LEARNING MODELS

Journal Title: Scientific Journal of Astana IT University - Year 2022, Vol 12, Issue 12

Abstract

Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at countering the implementation of such attacks. Malicious website detection is essential to prevent the spread of malware and protect end users from victims. Unfortunately, malicious URL detection still needs to be better understood due to a lack of features and inaccurate classification. Possible sources were examined in order to investigate the subject. Based on the collected information from previous studies, this study is devoted to solving the problem of detecting phishing websites using Ensemble Learning. The aim of the work is to choose the most optimal algorithm for classifying phishing websites using gradient boosting algorithms. AdaBoost, CatBoost, and Gradient Boosting Classifier were chosen as Ensemble Learning algorithms and were used to improve the efficiency of classifiers. Practical studies of the parameters of each algorithm for finding the optimal classification model are given. Research and experiments were carried out on a dataset containing information extracted from the contents of a URL: main URL, domain, directory, and file. A thorough Exploratory Data Analysis (EDA) was carried out, as a result of which the main dependencies and patterns of determining phishing resources were identified using correlation analysis. ROC AUC Score was chosen as an evaluation metric for the algorithms. The best result for predicting phishing websites was demonstrated by the AdaBoost Classifier algorithm, with an average ROC AUC score of 99%. The results of the experiments were illustrated in the form of graphs and tables.

Authors and Affiliations

Dinara Kaibassova, Margulan Nurtay, Ardak Tau, Mira Kissina

Keywords

Related Articles

SYSTEM OF PREVENTIVE АCTION OF CONSTRUCTION ENTERPRISES ON THE BASIS OF IDENTIFICATION OF ANTICRISIS POTENTIAL

Peculiarities of formation of anti-crisis potential of construction enterprises are considered. Construction companies are rapidly adapting to the requirements of the digital economy, transforming the management struct...

EXPERIENCE IN USING DISTANCE LEARNING TOOLS IN PROFESSIONAL DEVELOPMENT PEDAGOGICAL CORPS

The article presents and describes a tool for the professional development of teachers. Special attention is paid to the subject-methodical section, the implementation of which since 2020 has been taking place in an onli...

STRUCTURE OF THE PROJECT-ORIENTED ORGANIZATION ENERGY ENTROPY

This study presents the universal formalization of energy entropy for various organizations and its expression for project-oriented organizations. The energy entropy of organizations is determined by information entropy,...

IMPROVING THE METHOD OF SEARCHING DIGITAL ILLEGAL MEANS OBTAINING INFORMATION BASED ON CLUSTER ANALYSIS

Іn the article the possibilities of the multipositional technology of searching digital insertion devices are investigated based on clustering. Existing means of detecting radiation of digital illegal means obtaining inf...

STRATEGIES AND OPERATION PRINCIPLES OF LOGISTIC INFORMATION SYSTEMS

Over the last few years the so-called new logistic technologies are rapidly developed. Information systems hold the central position in these technologies. Development of logistics in the developed countries not least...

Download PDF file
  • EP ID EP713377
  • DOI 10.37943/12OYRS4391
  • Views 65
  • Downloads 0

How To Cite

Dinara Kaibassova, Margulan Nurtay, Ardak Tau, Mira Kissina (2022). SOLVING THE PROBLEM OF DETECTING PHISHING WEBSITES USING ENSEMBLE LEARNING MODELS. Scientific Journal of Astana IT University, 12(12), -. https://europub.co.uk/articles/-A-713377