Data Driven based Malicious URL Detection using Explainable AI

Saranda Poddar, Deepraj Chowdhury, Ashutosh Dhar Dwivedi*, Raghava Rao Mukkamala

*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

4 Citations (Scopus)

Abstract

With the ever-increasing reach of the internet, and its increasing access through various types of devices, the spread of malware, phishing attempts, etc. have steadily been increasing, along with their level of sophistication. Thus it becomes very important to conduct research on different methods to prevent such harmful attacks on systems and users. Using a malicious URL is the common way for hackers to attack a system, thus, to accommodate the variety attack vectors of malicious websites, 21 features were extracted from 651,191 URLs to train the proposed model. A two-stage stacked ensemble learning model, based on gradient boosting methods and random forest, has been trained and tested in the 70:30 ratio of the 651,191 URLs, and an accuracy of 97% has been achieved. Then Explainable AI (XAI) has been used to clearly explain the working of the model, and study the impact of each of the 21 features on the 4 class predictions (benign, defacement, phishing and malware).

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022
Number of pages7
PublisherIEEE Signal Processing Society
Publication date2022
Pages1266-1272
ISBN (Electronic)9781665494250
DOIs
Publication statusPublished - 2022
Event21st IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022 - Virtual, Online, China
Duration: 9 Dec 202211 Dec 2022

Conference

Conference21st IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022
Country/TerritoryChina
CityVirtual, Online
Period09/12/202211/12/2022
Sponsoret al., Huazhong University of Science and Technology, IEEE, IEEE Computer Society, School of Cyber Science and Engineering (CSE), HUST, TCSC IEEE
SeriesProceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Keywords

  • ensemble-learning
  • explainable AI
  • gradient boosting
  • malicious URL detection
  • random forest

Fingerprint

Dive into the research topics of 'Data Driven based Malicious URL Detection using Explainable AI'. Together they form a unique fingerprint.

Cite this