On generating network traffic datasets with synthetic attacks for intrusion detection

Carlos Garcia Cordero; Emmanouil Vasilomanolakis; Aidmar Wainakh; Max Mühlhäuser; Simin Nadjm-Tehrani

doi:10.1145/3424155

On generating network traffic datasets with synthetic attacks for intrusion detection

Carlos Garcia Cordero, Emmanouil Vasilomanolakis, Aidmar Wainakh, Max Mühlhäuser, Simin Nadjm-Tehrani

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

33 Citationer (Scopus)

71 Downloads (Pure)

Abstract

Most research in the field of network intrusion detection heavily relies on datasets. Datasets in this field, however, are scarce and difficult to reproduce. To compare, evaluate, and test related work, researchers usually need the same datasets or at least datasets with similar characteristics as the ones used in related work. In this work, we present concepts and the Intrusion Detection Dataset Toolkit (ID2T) to alleviate the problem of reproducing datasets with desired characteristics to enable an accurate replication of scientific results. Intrusion Detection Dataset Toolkit (ID2T) facilitates the creation of labeled datasets by injecting synthetic attacks into background traffic. The injected synthetic attacks created by ID2T blend with the background traffic by mimicking the background traffic's properties. This article has three core contributions. First, we present a comprehensive survey on intrusion detection datasets. In the survey, we propose a classification to group the negative qualities found in the datasets. Second, the architecture of ID2T is revised, improved, and expanded in comparison to previous work. The architectural changes enable ID2T to inject recent and advanced attacks, such as the EternalBlue exploit or a peer-to-peer botnet. ID2T's functionality provides a set of tests, known as TIDED, that helps identify potential defects in the background traffic into which attacks are injected. Third, we illustrate how ID2T is used in different use-case scenarios to replicate scientific results with the help of reproducible datasets. ID2T is open source software and is made available to the community to expand its arsenal of attacks and capabilities.

Originalsprog	Engelsk
Artikelnummer	8
Tidsskrift	ACM Transactions on Privacy and Security
Vol/bind	24
Udgave nummer	2
Antal sider	39
ISSN	2471-2566
DOI	https://doi.org/10.1145/3424155
Status	Udgivet - jan. 2021

Bibliografisk note

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2020 Association for Computing Machinery.
2471-2566/2020/12-ART8 $15.00
https://doi.org/10.1145/3424155

Adgang til dokumentet

10.1145/3424155Licens: CC BY-NC 4.0

Open Access ArticleForlagets udgivne version, 7,08 MBLicens: CC BY-NC 4.0

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{84d452ec99964ddcaebda554e68108a6,

title = "On generating network traffic datasets with synthetic attacks for intrusion detection",

abstract = "Most research in the field of network intrusion detection heavily relies on datasets. Datasets in this field, however, are scarce and difficult to reproduce. To compare, evaluate, and test related work, researchers usually need the same datasets or at least datasets with similar characteristics as the ones used in related work. In this work, we present concepts and the Intrusion Detection Dataset Toolkit (ID2T) to alleviate the problem of reproducing datasets with desired characteristics to enable an accurate replication of scientific results. Intrusion Detection Dataset Toolkit (ID2T) facilitates the creation of labeled datasets by injecting synthetic attacks into background traffic. The injected synthetic attacks created by ID2T blend with the background traffic by mimicking the background traffic's properties. This article has three core contributions. First, we present a comprehensive survey on intrusion detection datasets. In the survey, we propose a classification to group the negative qualities found in the datasets. Second, the architecture of ID2T is revised, improved, and expanded in comparison to previous work. The architectural changes enable ID2T to inject recent and advanced attacks, such as the EternalBlue exploit or a peer-to-peer botnet. ID2T's functionality provides a set of tests, known as TIDED, that helps identify potential defects in the background traffic into which attacks are injected. Third, we illustrate how ID2T is used in different use-case scenarios to replicate scientific results with the help of reproducible datasets. ID2T is open source software and is made available to the community to expand its arsenal of attacks and capabilities. ",

keywords = "Intrusion detection systems, attack injection, datasets, synthetic dataset",

author = "Cordero, {Carlos Garcia} and Emmanouil Vasilomanolakis and Aidmar Wainakh and Max M{\"u}hlh{\"a}user and Simin Nadjm-Tehrani",

note = "Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. {\textcopyright} 2020 Association for Computing Machinery. 2471-2566/2020/12-ART8 $15.00 https://doi.org/10.1145/3424155",

year = "2021",

month = jan,

doi = "10.1145/3424155",

language = "English",

volume = "24",

journal = "ACM Transactions on Privacy and Security",

issn = "2471-2566",

publisher = "Association for Computing Machinery",

number = "2",

}

TY - JOUR

T1 - On generating network traffic datasets with synthetic attacks for intrusion detection

AU - Cordero, Carlos Garcia

AU - Vasilomanolakis, Emmanouil

AU - Wainakh, Aidmar

AU - Mühlhäuser, Max

AU - Nadjm-Tehrani, Simin

N1 - Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2020 Association for Computing Machinery. 2471-2566/2020/12-ART8 $15.00 https://doi.org/10.1145/3424155

PY - 2021/1

Y1 - 2021/1

N2 - Most research in the field of network intrusion detection heavily relies on datasets. Datasets in this field, however, are scarce and difficult to reproduce. To compare, evaluate, and test related work, researchers usually need the same datasets or at least datasets with similar characteristics as the ones used in related work. In this work, we present concepts and the Intrusion Detection Dataset Toolkit (ID2T) to alleviate the problem of reproducing datasets with desired characteristics to enable an accurate replication of scientific results. Intrusion Detection Dataset Toolkit (ID2T) facilitates the creation of labeled datasets by injecting synthetic attacks into background traffic. The injected synthetic attacks created by ID2T blend with the background traffic by mimicking the background traffic's properties. This article has three core contributions. First, we present a comprehensive survey on intrusion detection datasets. In the survey, we propose a classification to group the negative qualities found in the datasets. Second, the architecture of ID2T is revised, improved, and expanded in comparison to previous work. The architectural changes enable ID2T to inject recent and advanced attacks, such as the EternalBlue exploit or a peer-to-peer botnet. ID2T's functionality provides a set of tests, known as TIDED, that helps identify potential defects in the background traffic into which attacks are injected. Third, we illustrate how ID2T is used in different use-case scenarios to replicate scientific results with the help of reproducible datasets. ID2T is open source software and is made available to the community to expand its arsenal of attacks and capabilities.

AB - Most research in the field of network intrusion detection heavily relies on datasets. Datasets in this field, however, are scarce and difficult to reproduce. To compare, evaluate, and test related work, researchers usually need the same datasets or at least datasets with similar characteristics as the ones used in related work. In this work, we present concepts and the Intrusion Detection Dataset Toolkit (ID2T) to alleviate the problem of reproducing datasets with desired characteristics to enable an accurate replication of scientific results. Intrusion Detection Dataset Toolkit (ID2T) facilitates the creation of labeled datasets by injecting synthetic attacks into background traffic. The injected synthetic attacks created by ID2T blend with the background traffic by mimicking the background traffic's properties. This article has three core contributions. First, we present a comprehensive survey on intrusion detection datasets. In the survey, we propose a classification to group the negative qualities found in the datasets. Second, the architecture of ID2T is revised, improved, and expanded in comparison to previous work. The architectural changes enable ID2T to inject recent and advanced attacks, such as the EternalBlue exploit or a peer-to-peer botnet. ID2T's functionality provides a set of tests, known as TIDED, that helps identify potential defects in the background traffic into which attacks are injected. Third, we illustrate how ID2T is used in different use-case scenarios to replicate scientific results with the help of reproducible datasets. ID2T is open source software and is made available to the community to expand its arsenal of attacks and capabilities.

KW - Intrusion detection systems

KW - attack injection

KW - datasets

KW - synthetic dataset

UR - http://www.scopus.com/inward/record.url?scp=85100823426&partnerID=8YFLogxK

U2 - 10.1145/3424155

DO - 10.1145/3424155

M3 - Journal article

SN - 2471-2566

VL - 24

JO - ACM Transactions on Privacy and Security

JF - ACM Transactions on Privacy and Security

IS - 2

M1 - 8

ER -

On generating network traffic datasets with synthetic attacks for intrusion detection

Abstract

Bibliografisk note

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater