On the ground truth problem of malicious DNS traffic analysis

Matija Stevanovic, Jens Myrup Pedersen, Alessandro D’Alconzo, Stefan Ruehrup, Andreas Berger

Research output: Contribution to journalJournal articleResearchpeer-review

18 Citations (Scopus)

Abstract

DNS is often abused by Internet criminals in order to provide flexible and resilient hosting of malicious content and reliable communication within their network architecture. The majority of detection methods targeting alicious DNS traffic are data-driven, most commonly having machine learning algorithms at their core. These methods require accurate ground truth of both malicious and benign DNS traffic for model training as well as for the performance evaluation. This paper elaborates on the problem of obtaining such a ground truth and evaluates practices employed by contemporary detection methods. Building upon the evaluation results, we propose a novel semi-manual labeling practice targeting agile DNS mappings, i.e. DNS queries that are used to reach a potentially malicious server characterized by fast changing domain names or/and IP addresses. The proposed approach is developed with the purpose of obtaining ground truth by incorporating the operator's insight in efficient and effective manner. We evaluate the proposed approach on a case study based on DNS traffic from an ISP network by comparing it with the popular labeling practices that rely on domain name and IP blacklists and whitelisting of popular domains. The evaluation indicates challenges and limitations of relying on existing labeling practices and shows a clear advantage of using the proposed approach in discovering a more complete set of potentially malicious domains and IP addresses. Furthermore, the novel approach attains time-efficient labeling with limited operator's involvement, thus is promising in view of the adoption in operational ISP networks.
Original languageEnglish
JournalComputers & Security
Volume55
Pages (from-to)142-158
ISSN0167-4048
DOIs
Publication statusPublished - 2015

Fingerprint

Labeling
traffic
evaluation
Network architecture
Learning algorithms
Learning systems
Servers
Internet
communication
Communication
learning
performance

Keywords

  • DNS
  • Traffic analysis
  • Ground truth
  • Data labeling
  • Blacklists
  • Whitelists

Cite this

Stevanovic, Matija ; Pedersen, Jens Myrup ; D’Alconzo, Alessandro ; Ruehrup, Stefan ; Berger, Andreas. / On the ground truth problem of malicious DNS traffic analysis. In: Computers & Security. 2015 ; Vol. 55. pp. 142-158.
@article{1e5984fd3b9d40ebb2470c2ac5199e11,
title = "On the ground truth problem of malicious DNS traffic analysis",
abstract = "DNS is often abused by Internet criminals in order to provide flexible and resilient hosting of malicious content and reliable communication within their network architecture. The majority of detection methods targeting alicious DNS traffic are data-driven, most commonly having machine learning algorithms at their core. These methods require accurate ground truth of both malicious and benign DNS traffic for model training as well as for the performance evaluation. This paper elaborates on the problem of obtaining such a ground truth and evaluates practices employed by contemporary detection methods. Building upon the evaluation results, we propose a novel semi-manual labeling practice targeting agile DNS mappings, i.e. DNS queries that are used to reach a potentially malicious server characterized by fast changing domain names or/and IP addresses. The proposed approach is developed with the purpose of obtaining ground truth by incorporating the operator's insight in efficient and effective manner. We evaluate the proposed approach on a case study based on DNS traffic from an ISP network by comparing it with the popular labeling practices that rely on domain name and IP blacklists and whitelisting of popular domains. The evaluation indicates challenges and limitations of relying on existing labeling practices and shows a clear advantage of using the proposed approach in discovering a more complete set of potentially malicious domains and IP addresses. Furthermore, the novel approach attains time-efficient labeling with limited operator's involvement, thus is promising in view of the adoption in operational ISP networks.",
keywords = "DNS, Traffic analysis, Ground truth, Data labeling, Blacklists, Whitelists",
author = "Matija Stevanovic and Pedersen, {Jens Myrup} and Alessandro D’Alconzo and Stefan Ruehrup and Andreas Berger",
year = "2015",
doi = "10.1016/j.cose.2015.09.004",
language = "English",
volume = "55",
pages = "142--158",
journal = "Computers & Security",
issn = "0167-4048",
publisher = "Elsevier",

}

On the ground truth problem of malicious DNS traffic analysis. / Stevanovic, Matija; Pedersen, Jens Myrup; D’Alconzo, Alessandro; Ruehrup, Stefan; Berger, Andreas.

In: Computers & Security, Vol. 55, 2015, p. 142-158.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - On the ground truth problem of malicious DNS traffic analysis

AU - Stevanovic, Matija

AU - Pedersen, Jens Myrup

AU - D’Alconzo, Alessandro

AU - Ruehrup, Stefan

AU - Berger, Andreas

PY - 2015

Y1 - 2015

N2 - DNS is often abused by Internet criminals in order to provide flexible and resilient hosting of malicious content and reliable communication within their network architecture. The majority of detection methods targeting alicious DNS traffic are data-driven, most commonly having machine learning algorithms at their core. These methods require accurate ground truth of both malicious and benign DNS traffic for model training as well as for the performance evaluation. This paper elaborates on the problem of obtaining such a ground truth and evaluates practices employed by contemporary detection methods. Building upon the evaluation results, we propose a novel semi-manual labeling practice targeting agile DNS mappings, i.e. DNS queries that are used to reach a potentially malicious server characterized by fast changing domain names or/and IP addresses. The proposed approach is developed with the purpose of obtaining ground truth by incorporating the operator's insight in efficient and effective manner. We evaluate the proposed approach on a case study based on DNS traffic from an ISP network by comparing it with the popular labeling practices that rely on domain name and IP blacklists and whitelisting of popular domains. The evaluation indicates challenges and limitations of relying on existing labeling practices and shows a clear advantage of using the proposed approach in discovering a more complete set of potentially malicious domains and IP addresses. Furthermore, the novel approach attains time-efficient labeling with limited operator's involvement, thus is promising in view of the adoption in operational ISP networks.

AB - DNS is often abused by Internet criminals in order to provide flexible and resilient hosting of malicious content and reliable communication within their network architecture. The majority of detection methods targeting alicious DNS traffic are data-driven, most commonly having machine learning algorithms at their core. These methods require accurate ground truth of both malicious and benign DNS traffic for model training as well as for the performance evaluation. This paper elaborates on the problem of obtaining such a ground truth and evaluates practices employed by contemporary detection methods. Building upon the evaluation results, we propose a novel semi-manual labeling practice targeting agile DNS mappings, i.e. DNS queries that are used to reach a potentially malicious server characterized by fast changing domain names or/and IP addresses. The proposed approach is developed with the purpose of obtaining ground truth by incorporating the operator's insight in efficient and effective manner. We evaluate the proposed approach on a case study based on DNS traffic from an ISP network by comparing it with the popular labeling practices that rely on domain name and IP blacklists and whitelisting of popular domains. The evaluation indicates challenges and limitations of relying on existing labeling practices and shows a clear advantage of using the proposed approach in discovering a more complete set of potentially malicious domains and IP addresses. Furthermore, the novel approach attains time-efficient labeling with limited operator's involvement, thus is promising in view of the adoption in operational ISP networks.

KW - DNS

KW - Traffic analysis

KW - Ground truth

KW - Data labeling

KW - Blacklists

KW - Whitelists

UR - http://authors.elsevier.com/a/1Rmhz_3pcoT0FT

U2 - 10.1016/j.cose.2015.09.004

DO - 10.1016/j.cose.2015.09.004

M3 - Journal article

VL - 55

SP - 142

EP - 158

JO - Computers & Security

JF - Computers & Security

SN - 0167-4048

ER -