Assessing usefulness of blacklists without the ground truth

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

Abstract

Domain name blacklists are used to detect malicious activity on the Internet.
Unfortunately, no set of blacklists is known to encompass all malicious domains, reflecting an ongoing struggle for defenders to keep up with attackers, who are often motivated by either criminal financial gain or strategic goals.
The result is that practitioners struggle to assess the value of using blacklists, and researchers introduce errors when using blacklists as ground truth.
We define the ground truth for blacklists to be the set of all currently malicious domains and explore the problem of assessing the accuracy and coverage.
Where existing work depends on an oracle or some ground truth, this work describes how blacklists can be analysed without this dependency.
Another common approach is to implicitly sample blacklists, where our analysis covers all entries found in the blacklists.
To evaluate the proposed method 31 blacklists have been collected every hour for 56 days, containing a total of 1,006,266 unique blacklisted domain names.
The results show that blacklists are very different when considering changes over time.
We conclude that it is important to consider the aspect of time when assessing the usefulness of a blacklist.
Close

Details

Domain name blacklists are used to detect malicious activity on the Internet.
Unfortunately, no set of blacklists is known to encompass all malicious domains, reflecting an ongoing struggle for defenders to keep up with attackers, who are often motivated by either criminal financial gain or strategic goals.
The result is that practitioners struggle to assess the value of using blacklists, and researchers introduce errors when using blacklists as ground truth.
We define the ground truth for blacklists to be the set of all currently malicious domains and explore the problem of assessing the accuracy and coverage.
Where existing work depends on an oracle or some ground truth, this work describes how blacklists can be analysed without this dependency.
Another common approach is to implicitly sample blacklists, where our analysis covers all entries found in the blacklists.
To evaluate the proposed method 31 blacklists have been collected every hour for 56 days, containing a total of 1,006,266 unique blacklisted domain names.
The results show that blacklists are very different when considering changes over time.
We conclude that it is important to consider the aspect of time when assessing the usefulness of a blacklist.
Original languageEnglish
Title of host publicationImage Processing and Communications Challenges 10
Number of pages8
PublisherSpringer
Publication date2018
Pages216-223
ISBN (Print)978-3-030-03657-7
ISBN (Electronic)978-3-030-03658-4
DOI
Publication statusPublished - 2018
Publication categoryResearch
Peer-reviewedYes
Event10th International Conference on Image Processing & Communications - Bydgoszcz, Poland
Duration: 14 Nov 201816 Nov 2018

Conference

Conference10th International Conference on Image Processing & Communications
LandPoland
ByBydgoszcz
Periode14/11/201816/11/2018
SeriesAdvances in Intelligent Systems and Computing
Volume892
ISSN2194-5357

    Research areas

  • Domain names, blacklists, domain names system
ID: 279622244